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AMD-K5"™ Processor 
x86 Architecture Extensions 





The AMD-K5™ processor is compatible with the instruction 
set, programming model, memory management mechanisms, 
and other software infrastructure supported by the 486 and 
Pentium (735\90, 815\100) processors. Operating system and 
application software that runs on the Pentium processor can be 
executed on the AMD-K5 processor without modification. 
Because the AMD-K5 processor takes a significantly different 
approach to implementing the x86 architecture, some subtle 
differences from the Pentium processor may be visible to sys- 
tem and code developers. These differences are described in 
Appendix A of the AMD-K5 Processor Technical Reference Man- 
ual, order# 18524. 


Call AMD at 1-800-222-9232 to order AMD-K5 processor sup- 
port documents. 


Before implementing the AMD-K5 processor model-specific 
features, check CPUID for supported feature flags. See 
“CPUID” on page 29 for more information. 
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Additions to the EFLAGS Register 





The EFLAGS register on the AMD-K5 processor defines new 
bits in the upper 16 bits of the register to support extensions to 
the operating modes. See “Virtual-8086 Mode Extensions 
(VME)” on page 12 and “CPUID” on page 29 for additional 
information. 


Control Register 4 (CR4) Extensions 





Control Register 4 (CR4) was added on the AMD-KS5 processor. 
The bits in this register control the various architectural exten- 
sions. The majority of the bits are reserved. The default state 
of CR4 is all zeros. Figure 1-1 shows the register and describes 
the bits. The architectural extensions are described in Table 
1-1. 





31 


8 7 6 5 43 2 1 ~«0 








—» Reserved 











Global Page Extension 
Machine Check Enable 
Page Size Extension 
Debugging Extensions 
Time Stamp Disable 
Protected Virtual Interrupts 


GPE a 


MCE 
PSE 


TSD 
PVI 


Virtual-8086 Mode Extensions VME 


Figure 1-1. Control Register 4 (CR4) 
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Table 1-1A. Control Register 4 (CR4) Fields 
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Bit Mnemonic Description Function 
Enables retention of designated entries in the 4-Kbyte TLB or 
Global Page 4-Mbyte TLB during invalidations. 
i ae Extension 1 = enabled, 0 = disabled. 
See “Global Pages” on page 8 for details. 
Enables machine-check exceptions. 
6 MCE Machine-Check Enable 1 = enabled, 0 = disabled. 
See “Machine-Check Exceptions” on page 4 for details. 
Enables 4-Mbyte pages. 
4 PSE Page size 1 = enabled, 0 = disabled. 
Extension 
See “4-Mbyte Pages” on page 4 for details. 
Enables I/O breakpoints in the DR7-DRO registers. 
3 DE Debugging 1 = enabled, 0 = disabled. 
Extensions 
See “Debug Registers” on page 84 for details. 
Selects privileged (CPL=0) or non-privileged (CPL>0) use of 
the RDTSC instruction, which reads the Time Stamp Counter 
1 =CPL must be 0, 0 =any CPL. 
See “Time Stamp Counter (TSC)” on page 27 for details. 
Enables hardware support for interrupt virtualization in Pro- 
tected mode. 
1 PVI Protected Virtual 1 = enabled, 0 = disabled. 
Interrupts 
See “Protected Virtual Interrupt (PVI) Extensions” on page 24 
for details. 
Enables hardware support for interrupt virtualization in Vir- 
tual-8086 mode. 
0 VME Virtual-8086 1 = enabled, 0 = disabled. 
Mode Extensions 
See “Virtual-8086 Mode Extensions (VME)” on page 12 for 
details. 
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Machine-Check Exceptions 


4-Mbyte Pages 


Bit 6 in CR4, the machine-check enable (MCE) bit, controls 
generation of machine-check exceptions (12h). If enabled by 
the MCE bit, these exceptions are generated when either of 
the following occurs: 


= System logic asserts BUSCHK to identify a parity or other 
type of bus-cycle error 


m The processor asserts PCHK while system logic asserts PEN 
to identify an enabled parity error on the D63—D0 data bus 


Whether or not machine-check exceptions are enabled, the 
processor does the following when either type of bus error 
occurs: 


= Latches the physical address of the failed cycle in its 64-bit 
machine-check address register (MCAR) 


m Latches the cycle definition of the failed cycle in its 64-bit 
machine-check type register (MCTR) 


Software can read the MCAR and MCTR registers in the excep- 
tion handling routine with the RDMSR instruction, as 
described on page 34. The format of the registers is shown in 
Figure 1-8 and Figure 1-9. 


If system software has cleared the MCE bit in CR4 to 0 before 
a bus-cycle error, the processor attempts to continue execution 
without generating a machine-check exception. It still latches 

the address and cycle type in MCAR and MCTR as described in 
this section. 


The TLBs in the 486 and 386 processors support only 4-Kbyte 
pages. However, large data structures such as a video frame 
buffer or non-paged operating system code can consume many 
pages and easily overrun the TLB. The AMD-K5 processor 
accommodates large data structures by allowing the operating 
system to specify 4-Mbyte pages as well as 4-Kbyte pages, and 
by implementing a four-entry, fully-associative 4-Mbyte TLB 
which is separate from the 128-entry, 4-Kbyte TLB. From a 
given page directory, the processor can access both 4-Kbyte 
pages and 4-Mbyte pages, and the page sizes can be intermixed 
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within a page directory. When the Page Size Extension (PSE) 
bit in CR4 is set, the processor translates linear addresses 
using either the 4-Kbyte TLB or the 4-Mbyte TLB, depending 
on the state of the page size (PS) bit in the page-directory 
entry. Figures 1-2 and 1-3 show how 4-Kbyte and 4-Mbyte page 
translation work. 





4-Kbyte 4-Kbyte 4-Kbyte 
Page Page Page 
Directory Table 





31 0 


Page Directory Page Table Page 
Offset Offset Offset 


Linear Address 


Figure 1-2. 4-Kbyte Paging Mechanism 
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4-Mbyte 
Page 







4-Mbyte 
Page 
Directory 





31 0 


Page Directory Page 
Offset Offset 


Linear Address 





Figure 1-3. 4-Mbyte Paging Mechanism 





To enable the 4-Mbyte paging option: 
1. Set the Page Size Extension (PSE) bit in CR4 to 1. 
2. Set the Page Size (PS) bit in the page-directory entry to 1. 


3. Write the physical base addresses of 4-Mbyte pages in bits 
31-22 of page-directory entries. (Bits 21-12 of these entries 
must be cleared to 0 or the processor will generate a page 
fault.) 


4. Load CR3 with the base address of the page directory that 
contains these page-directory entries. 


Figure 1-1 and Table 1-1 show the fields in CR4. Figure 1-4 and 
Table 1-2 show the fields in a page-directory entry. 
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4-Kbyte page translation differs from 4-Mbyte page translation 
in the following ways: 


m 4-Kbute Paging (Figure 1-2)—Bits 31-22 of the linear address 


select an entry in a 4-Kbyte page directory in memory, 
whose physical base address is stored in CR3. Bits 21-12 of 
the linear address select an entry in a 4-Kbyte page table in 
memory, whose physical base address is specified by bits 
31-22 of the page-directory entry. Bits 11-0 of the linear 
address select a byte in a 4-Kbyte page, whose physical base 
address is specified by the page-table entry. 


4-Mbute Paging (Figure 1-3)—Bits 31-22 of the linear 
address select an entry in a 4-Mbyte page directory in mem- 
ory, whose physical base address is stored in CR3. Bits 21-0 
of the linear address select a byte in a 4-Mbyte page in 
memory, whose physical base address is specified by bits 
31-22 of the page-directory entry. Bits 21-12 of the page- 
directory entry must be cleared to 0. 





109 8 7 6 5 4 3 2 


PIP 
Physical Base Address LA 
D 





Available to Software 
Global 

Page Size 

Dirty =0 

Accessed 

Page Cache Disable 
Page Writethrough 
User/Supervisor 
Write/Read 

Present (valid) 





AVL 


| 


PCD 


SHNWRUDN OE 


Figure 1-4. Page-Directory Entry (PDE) 
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Table 1-2A. Page-Directory Entry (PDE) Fields 





Bit Mnemonic Description Function 
For 4-Kbyte pages, bits 31-12 contain the physical base address of 
; a 4-Kbyte page table. 
31-12 BASE Physical Base ; ; ; 
Address For 4-Mbyte pages, bits 31-22 contain the physical base address 


of a 4-Mbyte page and bits 21-12 must be cleared to 0. (The pro- 
cessor will generate a page fault if bits 21-12 are not cleared to 0.) 


Software may use this field to store any type of information. When 











11-9 AVL Available to Software | the page-directory entry is not present (P bit cleared), bits 31-1 
become available to software. 
8 G Global 0 = local, 1 = global. 
PS Page Size 0 = 4-Kbyte, 1 = 4-Mbyte. 





For 4-Kbyte pages, this bit is undefined and ignored. The proces- 
sor does not change it. 


0 =not written, 1 = written. 








6 D Dir 
” For 4-Mbyte pages, the processor sets this bit to 1 during a write 

to the page that is mapped by this page-directory entry. 

0 =not written, 1 = written. 

The processor sets this bit to 1 during a read or write to any page 
5 A Accessed that is mapped by this page-directory entry. 

0 = not read or written, 1 = read or written. 

Specifies cacheability for all pages mapped by this page-directory 

Page Cache entry. Whether a location in a mapped page is actually cached 

4 PCD Disable also depends on several other factors. 


0 = cacheable page, 1 = non-cacheable. 





Specifies writeback or writethrough cache protocol for all pages 
mapped by this page-directory entry. Whether a location in a 

% PWT Page Writethrough | Mapped page is actually cached in a writeback or writethrough 
state also depends on several other factors. 


0 = writeback page, 1 = writethrough page. 





























2 U/S User/Supervisor 0 =user (any CPL), 1 = supervisor (CPL <3). 
1 W/R Write/Read 0 = read or execute, 1 = write, read, or execute. 
0 P Present 0 =not valid, 1 = valid. 

Global Pages 


The processor’s performance can sometimes be improved by 
making some pages global to all tasks and procedures. This can 
be done for both 4-Kbyte pages and 4-Mbyte pages. 
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The processor invalidates (flushes) both the 4-Kbyte TLB and 
the 4-Mbyte TLB whenever CR3 is loaded with the base 
address of the new task’s page directory. The processor loads 
CR3 automatically during task switches, and the operating sys- 
tem can load CR3 at any other time. Unnecessary invalidation 
of certain TLB entries can be avoided by specifying those 
entries as global (a global TLB entry references a global page). 
This improves performance after TLB flushes. Global entries 
remain in the TLB and need not be reloaded. For example, 
entries may reference operating system code and data pages 
that are always required. The processor operates faster if these 
entries are retained across task switches and procedure calls. 


To specify individual pages as global: 

1. Set the Global Page Extension (GPE) bit in CR4. 

2. (Optional) Set the Page Size Extension (PSE) bit in CR4. 
3. Set the relevant Global (G) bit for that page: 


For 4-Kbyte pages—Set the G bit in both the page-directory 
entry (shown in Figure 1-4 and Table 1-2) and the page- 
table entry (shown in Figure 1-5 and Table 1-3). 

For 4-Mbyte pages—(Optional) After the PSE bit in CR4 is 
set, set the G bit in the page-directory entry (shown in Fig- 
ure 1-4 and Table 1-2). 


4. Load CR3 with the base address of the page directory. 


The INVLPG instruction clears both the V and G bits for the 
referenced entry. To invalidate all entries, including global- 
page entries, in both TLBs: 


1. Clear the Global Page Extension (GPE) bit in CR4. 


2. Load CR3 with the base address of another (or same) page 
directory. 
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Physical Base Address 





Available to Software AVL 
Global G 
Page Size = 0 PS 
Dirty D 
Accessed A 
Page Cache Disable PCD 
Page Writethrough PWT 
User/Supervisor U/S 
Write/Read W/R 
Present (valid) P 





Figure 1-5. Page-Table Entry (PTE) 


SCHSHNWRUDN OG 
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Bit Mnemonic Description Function 
Physical Base : 
31-12 BASE hadrees The physical base address of a 4-Kbyte page. 
: Software may use the field to store any type of information. 
11-9 AVL —_ toon When the page-table entry is not present (P bit cleared), bits 31-1 
become available to software. 
8 G Global 0 =local, 1 = global. 
This bit is ignored in page-table entries, although clearing it to 0 
7 PS Page Size preserves consistent usage of this bit between page-table and 
page-directory entries. 
The processor sets this bit to 1 during a write to the page that is 
6 D Dirty mapped by this page-table entry. 
0 =not written, 1 = written. 
The processor sets this bit to 1 during a read or write to any page 
5 A Accessed that is mapped by this page-table entry. 
0 =not read or written, 1 = read or written. 
Specifies cacheability for all locations in the page mapped by this 
; page-table entry. Whether a location is actually cached also 
4 PCD Page Cache Disable | depends on several other factors. 
0 = cacheable page, 1 = non-cacheable. 
Specifies writeback or writethrough cache protocol for all loca- 
tions in the page mapped by this page-table entry. Whether a 
2 PWT Page Writethrough _| location is actually cached in a writeback or writethrough state 
also depends on several other factors. 
0 =writeback, 1 = writethrough. 
2 U/S User/Supervisor 0 = user (any CPL), 1 = supervisor (CPL < 3). 
1 W/R Write/Read 0 = read or execute, 1 = write, read, or execute. 
0 P Present 0 =not valid, 1 = valid. 
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Virtual-8086 Mode Extensions (VME) 


Interrupt Redirection 
in Virtual-8086 Mode 
Without VME 
Extensions 


The Virtual-8086 Mode Extensions (VME) bit in CR4 (bit 0) 
enable performance enhancements for 8086 programs running 
as protected tasks in Virtual-8086 mode. These extensions 
include: 


m Virtualizing maskable external interrupt control and notifi- 
cation via the VIF and VIP bits in EFLAGS 


m Selectively intercepting software interrupts (INTn instruc- 
tions) via the Interrupt Redirection Bitmap (IRB) in the 
Task State Segment (TSS) 


8086 programs expect to have full access to the interrupt flag 
(IF) in the EFLAGS register, which enables maskable external 
interrupts via the INTR signal. When 8086 programs run in Vir- 
tual-8086 mode on a 386 or 486 processor, they run as pro- 
tected tasks and access to the IF flag must be controlled by the 
operating system on a task-by-task basis to prevent corruption 
of system resources. 


Without the VME extensions available on the AMD-K5 proces- 
sor, the operating system controls Virtual-8086 mode access to 
the IF flag by trapping instructions that can read or write this 
flag. These instructions include STI, CLI, PUSHF, POPF, INTn, 
and IRET. This method prevents changes to the real IF when 
the I/O privilege level (IOPL) in EFLAGS is less than 3, the 
privilege level at which all Virtual-8086 tasks run. The operat- 
ing system maintains an image of the IF flag for each Virtual- 
8086 program by emulating the instructions that read or write 
IF. When an external maskable interrupt occurs, the operating 
system checks the state of the IF image for the current Virtual- 
8086 program to determine whether the program is allowing 
interrupts. If the program has disabled interrupts, the operat- 
ing system saves the interrupt information until the program 
attempts to re-enable interrupts. 


The overhead for trapping and emulating the instructions that 
enable and disable interrupts, and the maintenance of virtual 
interrupt flags for each Virtual-8086 program, can degrade the 
processor’s performance. This performance can be regained by 
running Virtual-8086 programs with IOPL set to 3, thus allow- 
ing changes to the real IF flag from any privilege level, but 
with a loss in protection. 
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In addition to these performance problems caused by virtual- 
ization of the IF flag in Virtual-8086 mode, software interrupts 
(those caused by INTn instructions that vector through inter- 
rupt gates) cannot be masked by the IF flag or virtual copies of 
the IF flag, these flags only affect hardware interrupts. Soft- 
ware interrupts in Virtual-8086 mode are normally directed to 
the Real mode interrupt vector table (IVT), but it may be 
desirable to redirect interrupts for certain vectors to the Pro- 
tected mode interrupt descriptor table (IDT). 


The processor’s Virtual-8086 mode extensions support both of 
these cases—hardware (external) interrupts and software 
interrupts—with mechanisms that preserve high performance 
without compromising protection. Virtualization of hardware 
interrupts is supported via the Virtual Interrupt Flag (VIF) 
and Virtual Interrupt Pending (VIP) flag in the EFLAGS regis- 
ter. Redirection of software interrupts is supported with the 
Interrupt Redirection Bitmap (IRB) in the TSS of each Virtual- 
8086 program. 


When VME extensions are enabled, the IF-modifying instruc- 
tions that are normally trapped by the operating system are 
allowed to execute, but they write and read the VIF bit rather 
than the IF bit in EFLAGS. This leaves maskable interrupts 
enabled for detection by the operating system. It also indicates 
to the operating system whether the Virtual-8086 program is 
able to or expecting to receive interrupts. 


When an external interrupt occurs, the processor switches 
from the Virtual-8086 program to the operating system, in the 
same manner as on a 386 or 486 processor. If the operating sys- 
tem determines that the interrupt is for the Virtual-8086 pro- 
gram, it checks the state of the VIF bit in the program’s 
EFLAGS image on the stack. If VIF has been set by the proces- 
sor (during an attempt by the program to set the IF bit), the 
operating system permits access to the appropriate Virtual- 
8086 handler via the interrupt vector table (IVT). If VIF has 
been cleared, the operating system holds the interrupt pend- 
ing. The operating system can do this by saving appropriate 
information (such as the interrupt vector), setting the pro- 
gram's VIP flag in the EFLAGS image on the stack, and return- 
ing to the interrupted program. When the program 
subsequently attempts to set IF, the set VIP flag causes the 
processor to inhibit the instruction and generate a general- 
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protection exception with error code zero, thereby notifying 
the operating system that the program is now prepared to 
accept the interrupt. 


Thus, when VME extensions are enabled, the VIF and VIP bits 
are set and cleared as follows: 


s VIF—This bit is controlled by the processor and used by the 
operating system to determine whether an external 
maskable interrupt should be passed on to the program or 
held pending. VIF is set and cleared for instructions that 
can modify IF, and it is cleared during software interrupts 
through interrupt gates. The original IF value is preserved 
in the EFLAGS image on the stack. 


us VIP—This bit is set and cleared by the operating system via 
the EFLAGS image on the stack. It is set when an interrupt 
occurs for a Virtual-8086 program who’s VIF bit is cleared. 
The bit is checked by the processor when the program sub- 
sequently attempts to set VIF. 


Figure 1-6 and Table 1-4 show the VIF and VIP bits in the 
EFLAGS register. The VME extensions support conventional 
emulation methods for passing interrupts to Virtual-8086 pro- 
grams, but they make it possible for the operating system to 
avoid time-consuming emulation of most instructions that 
write or read the IF. 


The VIF and IF flags only affect the way the operating system 
deals with hardware interrupts (the INTR signal). Software 
interrupts are handled like machine-generated exceptions and 
cannot be masked by real or virtual copies of IF (see “Software 
Interrupts and the Interrupt Redirection Bitmap (IRB) Exten- 
sion” on page 20). The VIF and VIP flags only ease the soft- 
ware overhead associated with managing interrupts so that 
virtual copies of the IF flag do not have to be maintained by 
the operating system. Instead, each task’s TSS holds its own 
copy of these flags in its EFLAGS image. 
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—» Reserved 

ID Flag ID 21 
Virtual Interrupt Pending VIP 20 
Virtual Interrupt Flag VIF 19 
Alignment Check AC 18 
Virtual-8086 Mode VM 17 
Resume Flag RF 16 
Nested Task NT 14 
1/0 Privilege Level IOPL 13-12 
Overflow Flag OF 11 
Direction Flag DF 10 
Interrupt Flag IF 9 
Trap Flag TF 8 
Sign Flag SF 7 
Zero Flag ZF 6 
Auxiliary Flag AF 4 
Parity Flag PF 
Carry Flag CF =O 


Figure 1-6. EFLAGS Register 





Table 1-4A. Virtual-Interrupt Additions to EFLAGS Register 





Bit Mnemonic Description 


Virtual Interrupt Pend- 
ing 


20 VIP 


Function 


Set by the operating system (via the EFLAGS image on the stack) 
when an external maskable interrupt (INTR) occurs for a Virtual- 
8086 program who's VIF bit is cleared. The bit is checked by the 
processor when the program subsequently attempts to set VIF. 





19 VIF Virtual Interrupt Flag 














When the VME bit in CR4 is set, the VIF bit is modified by the 
processor when a Virtual-8086 program running at less privilege 
than the IOPL attempts to modify the IF bit. The VIF bit is used by 
the operating system to determine whether a maskable interrupt 
should be passed on to the program or held pending. 
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Table 1-5A through Table 1-5E shows the effects, in various 
x86-processor modes, of instructions that read or write the IF 
and VIF flag. The column headings in this table include the fol- 


lowing values: 


dler 


PE—Protection Enable bit in CRO (bit 0) 

VM— Virtual-8086 Mode bit in EFLAGS (bit 17) 

VME— Virtual Mode Extensions bit in CR4 (bit 0) 
PVI—Protected-mode Virtual Interrupts bit in CR4 (bit 1) 
IOPL—IJ/O Privilege Level bits in EFLAGS (bits 13-12) 
Handler CPL—Code Privilege Level of the interrupt han- 


GP(0)—General-protection exception, with error code = 0 


IF—Interrupt Flag bit in EFLAGS (bit 9) 


VIF— Virtual Interrupt Flag bit in EFLAGS (bit 19) 
























































Table 1-5A. Instructions that Modify the IF or VIF Flags—Real Mode 
TYPE PE VM VME PVI IOPL GP(0) IF VIF 
CLI 0 0 0 0 = No IF—0 = 
STI 0 0 0 0 - No IFe 1 = 
PUSHF 0 0 0 0 - No Pushed - 
POPF 0 0 0 0 - No Popped - 
IRET 0 0 0 0 - No Popped - 
Notes: 
— Not applicable. 
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Table 1-5B. Instructions that Modify the IF or VIF Flags—Protected Mode 































































































TYPE PE vm | vME | PVI | IOPL GP(0) IF VIF 
CLI 1 0 - 0 > CPL = No IFO 0 a 
CLI 1 0 = 0 < CPL - Yes - = 
STI 1 0 - 0 > CPL = No IFe 1 = 
STI 1 0 - 0 < CPL - Yes - - 
PUSHF 1 0 - 0 > CPL - No Pushed - 
PUSHF 1 0 - 0 < CPL - No Pushed - 
PUSHFD 1 0 - 0 > CPL - No Pushed Pushed 
PUSHFD 1 0 - 0 < CPL ~ No Pushed Pushed 
POPF 1 0 - 0 > CPL - No Popped - 
POPF 1 0 - 0 < CPL - No Not Popped - 
POPFD 1 0 - 0 > CPL - No Popped | Not Popped 
POPFD 1 0 - 0 < CPL - No Not Popped | Not Popped 
IRET 1 0 - 0 - =0 No Popped - 
IRET 1 0 - 0 > CPL >0 No! Popped - 
IRET 1 0 - 0 < CPL >0 No! | Not Popped - 
IRETD 1 0 - 0 - =0 No Popped Popped 
IRETD 1 0 - 0 > CPL >0 No! Popped | Not Popped 
IRETD 1 0 - 0 < CPL >0 No! | Not Popped | Not Popped 
Notes: 

1. GP(0) if the CPL of the task executing IRETD is greater than the CPL of the task returned to. 

— Not applicable. 
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Table 1-5C. Instructions that Modify the IF or VIF Flags—Virtual-8086 Mode 




































































TYPE PE VM VME PVI IOPL GP(0) IF VIF 
CLI 1 1 0 - 3 No IF <0 No Change 
CLI 1 1 0 = <3 Yes = - 
STI 1 1 0 - 3 No IF< 1 No Change 
STI 1 1 0 - <3 Yes - - 
PUSHF 1 1 0 - 3 No Pushed - 
PUSHF 1 1 0 - <3 Yes - - 
PUSHFD 1 1 0 - 3 No Pushed Pushed 
PUSHFD 1 1 0 - <3 Yes - - 
POPF 1 1 0 - 3 No Popped - 
POPF 1 1 0 = <3 Yes = = 
POPFD 1 1 0 - 3 No Popped Not Popped 
POPFD 1 1 0 = <3 Yes - - 
IRETD2 1 1 0 - - No Popped Popped 
Notes: 

1. All Virtual-8086 mode tasks run at CPL = 3. 

2. All protected virtual interrupt handlers run at CPL = 0. 

— Not applicable. 
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Table 1-5D. Instructions that Modify the IF or VIF Flags—Virtual-8086 Mode Interrupt 













































































Extensions (VME)! 
TYPE PE VM VME PVI IOPL GP(0) IF VIF 
fa ttst‘i;;ét‘SCTl COCO TN FeO] ONoChange | 

CLI 1 1 1 - <3 No No Change VIF —0 
STI 1 1 1 - 3 No IF<1 No Change 
STI 1 1 1 - <3 No? No Change VIF < 1 
PUSHF 1 1 1 - 3 No Pushed Not Pushed 
PUSHF 1 1 1 - <3 No Not Pushed | Pushed into IF 
PUSHFD 1 1 1 - 3 No Pushed Pushed 
PUSHFD 1 1 1 - <3 Yes - = 
POPF 1 1 1 - 3 No Popped Not Popped 
POPF 1 1 1 - <3 No Not Popped | Popped from IF 
POPFD 1 1 1 - 3 No Popped Not Popped 
POPFD 1 1 1 - <3 Yes - - 
ee aie 1 1 1 - b No Popped Not Popped 
te nee 1 1 1 - <3 No? | Not Popped | Popped from IF 
ve coe 1 1 1 - 3 No Popped Not Popped 
a ca 1 - <3 Yes - - 
IRETD from 
Protected Mode? 7 7 No® Popped Popped 
Notes: 

1. All Virtual-8086 mode tasks run at CPL = 3. 

2. All protected virtual interrupt handlers run at CPL = 0. 

3. GP(0) if an attempt is made to set VIF when VIP = 1. 

— Not applicable. 
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Table 1-5E. Instructions that Modify the IF or VIF Flags—Protected Mode Virtual 
Interrupt Extensions (PVI)! 















































TYPE PE VM VME PVI IOPL GP(0) IF VIF 

CLI 1 0 - 1 3 No IF <0 No Change 
CLI 1 0 - 1 <3 No No Change VIF — 0 
STI 1 0 _ 1 3 No IF< 1 No Change 
STI 1 0 - 1 <3 No? No Change VIF< 1 
PUSHF 1 0 - 1 q No Pushed Not Pushed 
PUSHF 1 0 - 1 <3 No Pushed Not Pushed 
PUSHFD 1 0 - 1 3 No Pushed Pushed 
PUSHFD 1 0 - 1 <3 No Pushed Pushed 
POPF 1 0 - 1 3 No Popped Not Popped 
POPF 1 0 _ 1 <3 No Not Popped | Not Popped 
POPFD 1 0 - 1 3 No Popped Not Popped 
POPFD 1 0 _ 1 <3 No Not Popped | Not Popped 
IRETD2 1 0 - 1 - No? Popped Popped 
Notes: 

1. All Protected mode virtual interrupt tasks run at CPL = 3. 

2. All protected mode virtual interrupt handlers run at CPL = 0. 

3. GP(0) if an attempt is made to set VIF when VIP = 1. 

— Not applicable. 





























Software Interrupts 
and the Interrupt 
Redirection Bitmap 
(IRB) Extension 


In Virtual-8086 mode, software interrupts (INTn exceptions 
that vector through interrupt gates) are trapped by the operat- 
ing system for emulation, because they would otherwise clear 
the real IF. When VME extensions are enabled, these INTn 
instructions are allowed to execute normally, vectoring 
directly to a Virtual-8086 service routine via the Virtual-8086 
interrupt vector table (IVT) at address 0 of the task address 
space. However, it may still be desirable for security or perfor- 
mance reasons to intercept INTn instructions on a vector- 
specific basis to allow servicing by Protected-mode routines 
accessed through the interrupt descriptor table (IDT). This is 
accomplished by an Interrupt Redirection Bitmap (IRB) in the 
TSS, which is created by the operating system in a manner sim- 
ilar to the IO Permission Bitmap (IOPB) in the TSS. 


Figure 1-7 shows the format of the TSS, with the Interrupt 
Redirection Bitmap near the top. The IRB contains 256 bits, 
one for each possible software-interrupt vector. The most- 
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significant bit of the IRB is located immediately below the 
base of the IOPB. This bit controls interrupt vector 255. The 
least-significant bit of the IRB controls interrupt vector 0. 


The bits in the IRB work as follows: 


m Set—lIf set to 1, the INTn instruction behaves as if the VME 
extensions are not enabled. The interrupt vectors to a Pro- 
tected-mode routine if IOPL = 3, or it causes a general-pro- 
tection exception with error code zero if IOPL<3. 


m Cleared—If cleared to 0, the INTn instruction vectors 
directly to the corresponding Virtual-8086 service routine 
via the Virtual-8086 program’s IVT. 


Only software interrupts can be redirected via the IRB toa 
Real mode IVT—hardware interrupts cannot. Hardware inter- 
rupts are asynchronous events and do not belong to any cur- 
rent virtual task. The processor thus has no way of deciding 
which IVT (for which Virtual-8086 program) to direct a hard- 
ware interrupt to. Because of this, hardware interrupts always 
require operating system intervention. The VIF and VIP bits 
described in “Hardware Interrupts and the VIF and VIP Exten- 
sions” on page 13 are provided to assist the operating system 
in this intervention. 





Control Register 4 (CR4) Extensions 21 


AMD¢1 





AMD-K5 Processor Software Development Guide 20007D/0—Sep1996 
31 0 
TSS Limit 
Shas from TR 
| \/O Permission Bitmap (IOPB) 
(up to 8 Kbyte) 


Interrupt Redirection Bitmap (IRB) 


(eight 32-bit locations) 





Operating System 
Data Structure 


i 


EDI 


ESI 
EBP 


EFLAGS 


ESP 
EBX 

DX 
ECX 
EAX 
EIP 
CR3 





Figure 1-7. Task State Segment (TSS) 
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Table 1-6 compares the behavior of hardware and software 
interrupts in various x86-processor operating modes. It also 
shows which interrupt table is accessed: the Protected-mode 
IDT or the Real- and Virtual-8086-mode IVT. The column head- 
ings in this table include: 


PE—Protection Enable bit in CRO (bit 0) 


VM— Virtual-8086 Mode bit in EFLAGS (bit 17) 


VME—Virtual Mode Extensions bit in CR4 (bit 0) 
PVI—Protected-Mode Virtual Interrupts bit in CR4 (bit 1) 


IOPL—I/O Privilege Level bits in EFLAGS (bits 13-12) 


IRB—Interrupt Redirection Bit for a task, from the Inter- 


rupt Redirection Bitmap (IRB) in the tasks TSS 


GP(0)—General-protection exception, with error code = 0 
IDT—Protected-Mode Interrupt Descriptor Table 
IVT—Real- and Virtual-8086 Mode Interrupt Vector Table 





Table 1-6A. Interrupt Behavior and Interrupt-Table Access 



















































































Mode "ee PE | vm | VME | pv | 1OPL | IRB | GP(o) | IT | WT 
Software 0 0 0 - - - - J 
Real mode 
Hardware 0 0 0 - - - - J 
Software 1 0 0 - - - - J - 
Protected mode 
Hardware 1 0 0 - - - - J - 
Giinalanas Software 1 1 0 - = - No J - 
irtual- 
1 Software 1 1 0 - <3 - Yes J - 
mode 
Hardware 1 1 0 - - - No J - 
Software 1 1 1 0 - 0 No - J 
Virtual-8086 = _ 
Mode Exten- Software 1 1 1 0 a 1 No J 
sions (VME)! Software 1 1 1 0 <3 1 Yes J - 
Hardware 1 1 1 0 - - No J - 
Protected Vir- Software 1 0 1 1 - - No J - 
tual Extensions 
(PVI) Hardware 1 0 1 1 - - No J - 
Notes: 
1. All Virtual-8086 tasks run at CPL = 3. 
— Not applicable. 
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Protected Virtual Interrupt (PVI) Extensions 


The Protected Virtual Interrupts (PVI) bit in CR4 enables sup- 
port for interrupt virtualization in Protected mode. In this vir- 
tualization, the processor maintains program-specific VIF and 
VIP flags in a manner similar to those in Virtual-8086 Mode 
Extensions (VME). When a program is executed at CPL = 3, it 
can set and clear its copy of the VIF flag without causing 
general-protection exceptions. 


The only differences between the VME and PVI extensions are 
that, in PVI, selective INTn interception using the Interrupt 
Redirection Bitmap in the TSS does not apply, and only the STI 
and CLI instructions are affected by the extension. 


Table 1-5A through Table 1-5E and Table 1-6 show, among 
other things, the behavior of hardware and software inter- 
rupts, and instructions that affect interrupts, in Protected 
mode with the PVI extensions enabled. 
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Model-Specific Registers (MSRs) 





The processor supports model-specific registers (MSRs) that 
can be accessed with the RDMSR and WRMSR instructions 
when CPL = 0. The following index values in the ECX register 
access specific MSRs: 


00h: Machine-Check Address Register (MCAR) 
01h: Machine-Check Type Register (MCTR) 
10h: Time Stamp Counter (TSC) 

82h: Array Access Register (AAR) 

83h: Hardware Configuration Register (HWCR) 


The RDMSR and WRMSR instructions are described on page 
34. The following sections describe the format of the registers. 


Machine-Check Address Register (MCAR) 


The processor latches the address of the current bus cycle in 
its 64-bit Machine-Check Address Register (MCAR) when a 
bus-cycle error occurs. These errors are indicated either by (a) 
system logic asserting BUSCHK, or (b) the processor asserting 
PCHK while system logic asserts PEN. 


The MCAR can be read with the RDMSR instruction when the 
ECX register contains the value 00h. Figure 1-8 shows the for- 
mat of the MCAR register. The contents of the register can be 
read with the RDMSR instruction. 


If system software has set the MCE bit in CR4 before the bus- 
cycle error, the processor also generates a machine-check 
exception as described on page 4. 





Physical Address of Last Bus Cycle that Failed 


Figure 1-8. Machine-Check Address Register (MCAR) 
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Machine-Check Type Register (MCTR) 


The processor latches the cycle definition and other informa- 
tion about the current bus cycle in its 64-bit Machine-Check 
Type Register (MTAR) at the same times that the Machine- 
Check Address Register (MCAR) latches the cycle address: 
when a bus-cycle error occurs. These errors are indicated 
either by (a) system logic asserting BUSCHK, or (b) the proces- 
sor asserting PCHK while system logic asserts PEN. 


The MCTR can be read with the RDMSR instruction when the 
ECX register contains the value 01h. Figure 1-9 and Table 1-7 
show the formats of the MCTR register. The contents of the 
register can be read with the RDMSR instruction. The proces- 
sor clears the CHK bit (bit 0) in MCTR when the register is 
read with the RDMSR instruction. 


If system software has set the MCE bit in CR4 before the bus- 
cycle error, the processor also generates a machine-check 
exception as described on page 4. 











—» Reserved 











Locked Cycle 

Memory or I/O Cycle 
Data or Code Cycle 

Write or Read Cycle 

Valid Machine-Check Data 


LOCK 4 
M/lo 3 
D/C 2 
W/R 1 
CHK 0 


Figure 1-9. Machine-Check Type Register (MCTR) 
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Table 1-7A. Machine-Check Type Register (MCTR) Fields 























Bit Mnemonic Description Function 

4 LOCK Locked Cydle . 1 if the processor was asserting LOCK during the bus 

3 M/IO Memory or I/O 1 = memory cycle, 0 =1/O cycle. 

2 D/C Data or Code 1 = data cycle, 0 = code cycle. 

1 W/R Write or Read 1 = write cycle, 0 = read cycle. 
The processor sets the CHK bit to 1 when both the MCTR and 

0 CHK Valid Machine-Check MCAR registers contain valid information. The processor clears 

Data the CHK bit to 0 when software reads the MCTR with the 

RDMSR instruction. 














Time Stamp Counter (TSC) 


With each processor clock cycle, the processor increments a 64- 
bit time stamp counter (TSC) model-specific register. The 
counter can be written or read using the WRMSR or RDMSR 
instructions when the ECX register contains the value 10h and 
CPL = 0. The counter can also be read using the RDTSC 
instruction (see page 33) but the required privilege level for 
this instruction is determined by the Time Stamp Disable 
(TSD) bit in CR4. With any of these instructions, the EDX and 
EAX registers hold the upper and lower double-words (dwords) 
of the 64-bit value to be written to or read from the TSC, as 
follows: 


ms EDX—Upper 32 bits of TSC 
ms EAX—Lower 32 bits of TSC 


The TSC can be loaded with any arbitrary value. 


Array Access Register (AAR) 


The Array Access Register (AAR) contains pointers for testing 
the tag and data arrays for the instruction cache, data cache, 4- 
Kbyte TLB, and 4-Mbyte TLB. The AAR can be written or read 
with the WRMSR or RDMSR instruction when the ECX regis- 
ter contains the value 82h. 


For details on the AAR, see “Cache and TLB Testing” on page 
75% 
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Hardware Configuration Register (HWCR) 


The Hardware Configuration Register (HWCR) contains con- 
figuration bits that control miscellaneous debugging functions. 
The HWCR can be written or read with the WRMSR or 
RDMSR instruction when the ECX register contains the value 
83h. 


For details on the HWCR, see “Hardware Configuration Regis- 
ter (HWCR)” on page 71. 


New Instructions 





In addition to supporting all the 486 processor instructions, the 
AMD-KS processor implements the following instructions: 


CPUID 

CMPXCHG8B 

MOV to and from CR4 

RDTSC 

RDMSR 

WRMSR 

RSM 

Illegal instruction (Reserved opcode) 
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CPUID 

mnemonic opcode description 

CPUID OF A2h Identify processor 

Privilege: Any level 

Registers Affected: EAX, EBX, ECX, EDX 

Flags Affected: none 


Exceptions Generated: Real, Virtual-8086 mode—none 
Protected mode—none 





The CPUID instruction identifies the type of processor and the features it supports. 
A 0 or 1 value written to the EAX register specifies what information will be 
returned by the instruction. 


The processor implements the ID flag (bit 21) in the EFLAGS register. By writing and 
reading this bit, software can verify that the processor will execute the CPUID 
instruction. 


For detailed instructions on processor and feature identification see the AMD Proces- 
sor Recognition application note, order# 20734. 


Table 1-8 outlines the AMD-K5 processor family codes and model codes with the CPU 
clock frequencies (MHz), bus frequencies (MHz), and P-Rating strings (“PRxxx”). 





Table 1-8A. CPU Clock Frequencies, Bus Frequencies, and P-Rating Strings 


























Family Code | Model Code | CPU Frequency (MHz) | CPU Bus Frequency (MHz) | P-Rating String (“PRxxx”)' 
ee ee ny es 7 7 

0 90 60 PR90 

100 66 PR100 

5 90 60 PR120 

100 66 PR133 

120 60 PR150 

133 66 PR166 














Notes: 
1. The CPUID instruction does not return a P-Rating string. 


— This table does not constitute product announcements. Instead, the information in the table represents possible product offerings. 
AMD will announce actual products based on availability and market demand. 
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The list below prioritizes the recommended BIOS CPU ID strings. The primary 
requirement is that if the CPU clock frequency is to be displayed the P-rating must 
also be displayed. 


Recommended: 
“AMD-K5-PRxxx” No clock or bus frequency information is displayed. 

OR 
“AMD-K5-PRxxx” “PRxxx” indicates the P-Rating for the installed K86™ processor. “yyy MHz” indicates 
“yyy MHz” the clock frequency of the processor. “zzz Mhz” indicates the bus frequency of the 
“zzz Mhz" processor. Display of the bus frequency is encouraged, but not required. 


Acceptable: 
The default is recommended if the clock frequency detected is not in the P-Rating 
“AMD-K5” table. The actual frequency should not be displayed anywhere in the boot-up dis- 


play. 
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CMPXCHG8B 


mnemonic opcode description 





CMPXCHG8B 1/m64 = OF C7h Compare and exchange 8-byte operand 


Privilege: Any level 
Registers Affected: EAX, EBX, ECX, EDX 
Flags Affected: ZF 


Exceptions Generated: Real, Virtual-8086, Protected mode—GP(0). Invalid opcode if destination is a register. 
Virtual-8086 mode—Page fault 





The CMPXCHG8B instruction is an 8-byte version of the 4-byte CMPXCHG instruc- 
tion supported by the 486 processor. CMPXCHG8B compares a value from memory 
with a value in the EDX and EAX register, as follows: 


ms EDX—Upper 32 bits of compare value 

ms EAX—Lower 32 bits of compare value 

If the memory value matches the value in EDX and EAX, the ZF flag is set to 1 and 
the 8-byte value in ECX and EBX is written to the memory location, as follows: 

ms ECX—Upper 32 bits of exchange value 

ms EBX—Lower 32 bits of exchange value 
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MOV to and from CR4 

mnemonic opcode description 

MOV CR4,r32 OF 22h Move to CR4 from register 

MOV r32,CR4 OF 20h Move to register from CR4 

Privilege: CPL =0 

Registers Affected: CR4, 32-bit general-purpose register 

Flags Affected: OF, SF, ZF, AF, PF, and CF are undefined 


Exceptions Generated: Real mode—none 
Virtual-8086 mode—GP(0) 
Protected mode—GP(0) if CPL not = 0 





These instructions read and write control register 4 (CR4). 
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RDTSC 

mnemonic opcode description 

RDTSC OF 31h Read time stamp counter 

Privilege: Selectable by TSD bit in CR4 

Registers Affected: EAX, EDX 

Flags Affected: none 


Exceptions Generated: Real—none 
Virtual-8086 mode—Invalid Opcode 
Protected mode—GP (0) if CPL not = 0 when CR4.TSD=1 





The AMD-KS5 processor’s 64-bit time stamp counter (TSC) increments on each proces- 
sor Clock. In Real or Protected mode, the counter can be read with the RDMSR 
instruction and written with the WRMSR instruction when CPL = 0. However, in Pro- 
tected mode the RDTSC instruction can be used to read the counter at privilege lev- 
els higher than CPL = 0. 


The required privilege level for using the RDTSC instruction is determined by the 
Time Stamp Disable (TSD) bit in CR4, as follows: 


= CPL = 0—Set the TSD bit in CR4 tol 
ms Any CPL—Clear the TSD bit in CR4 to 0 


The RDTSC instruction reads the counter value into the EDX and EAX registers as 
follows: 


ms EDX—Upper 32 bits of TSC 
mw EAX—Lower 32 bits of TSC 


The following example shows how the RDTSC instruction can be used. After this 
code is executed, EAX and EDX contain the time required to execute the RDTSC 
instruction. 


mov ecx,10h ;Time Stamp Counter Access via MSRs 
mov eax,00000000h ;Initialize the Counter to zero 

db OFh, 30h ;WRMSR 

db OFh, 31h s;RDTSC 

db OFh, 31h ;RDTSC 
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RDMSR and WRMSR 

mnemonic opcode description 

RDMSR OF 32h Read model-specific register (MSR) 

WRMSR OF 30h Write model-specific register (MSR) 

Privilege: CPL=0 

Registers Affected: EAX, ECX, EDX 

Flags Affected: none 


Exceptions Generated: Real—GP(0) for unimplemented MSR address 


Virtual-8086 mode—GP(0) 
Protected mode—GP(0) if CPL not = 0 
Protected mode—GP(0) for unimplemented MSR address 





The RDMSR or WRMSR instructions can be used in Real or Protected mode to access 
several 64-bit, model-specific registers (MSRs). These registers are addressed by the 
value in ECX, as follows: 


00h: Machine-Check Address Register (MCAR). This may contain the physical 
address of the last bus cycle for which the BUSCHK or PCHK signal was asserted. 
For details, see “Machine-Check Address Register (MCAR)” on page 25. 


01h: Machine-Check Type Register (MCTR). This contains the cycle definition of 
the last bus cycle for which the BUSCHK or PCHK signal was asserted. For 
details, see “Machine-Check Type Register (MCTR)” on page 26. The processor 
clears the CHK bit (bit 0) in MCTR when the register is read with the RDMSR 
instruction. 


10h: Time Stamp Counter (TSC). This contains a time value. The TSC can be ini- 
tialized to any value with the WRMSR instruction, and it can be read with either 
the RDMSR or RDTSC instruction. For details, see “Time Stamp Counter (TSC)” 
on page 27. 

82h: Array Access Register (AAR). This contains an array pointer and test data 
for testing the processor’s cache and TLB arrays. For details on the AAR, see 
“Cache and TLB Testing” on page 75. 

83h: Hardware Configuration Register (HWCR). This contains configuration bits 
that control miscellaneous debugging functions. For details, see “Hardware Con- 
figuration Register (HWCR)” on page 71. 
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The above value in ECX identifies the register to be read or written. The EDX and 
EAX registers contain the MSR values to be read or written, as follows: 


ms EDX—Upper 32 bits of MSR. For the AAR, this contains the array pointer and (in 
contrast to all other MSRs) its contents are not altered by a RDMSR instruction. 

mw EAX—Lower 32 bits of MSR. For the AAR, this contains the data to be read/writ- 
ten. 


All MSRs are 64 bits wide. However, the upper 32 bits of the AAR are write-only and 
are not returned on a read. EDX remains unaltered, making it more convenient to 
maintain the array pointer. 


If an attempt is made to execute either the RDMSR or WRMSR instruction when 
CPL is greater than 0, or to access an undefined model-specific register, the proces- 
sor generates a general-protection exception with error code zero. 


Model-specific registers, as their name implies, may or may not be implemented by 
later models of the AMD-KS5 processor. 
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RSM 

mnemonic opcode description 

RSM OF AAh Resume execution (exit System Management Mode) 

Privilege: CPL =0 


Registers Affected: CS, DS, ES, FS, GS, SS, EIP, EFLAGS, LDTR, 
CR3, EAX, EBX, ECX, EDX, ESP, EBP, EDI, ESI 

Flags Affected: none 

Exceptions Generated: Real, Virtual-8086 mode—Invalid opcode if not in SMM 
Protected mode—Invalid opcode if not in SMM 
Protected mode—GP(0) if CPL not = 0 





The RSM instruction should be the last instruction in any System Management Mode 
(SMM) service routine. It restores the processor state that was saved when the SMI 
interrupt was asserted. This instruction is only valid when the processor is in SMM. It 
generates an invalid opcode exception at all other times. 


The processor enters the Shutdown state if any of the following illegal conditions are 
encountered during the execution of the RSM instruction: the SMM base value is not 
aligned on a 32-Kbyte boundary, or any reserved bit of CR4 set to 1, or the PG bit is 

set while the PE is cleared in CRO, or the NW bit it set while the CD bit is cleared in 
CRO. 
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illegal Instruction (Reserved Opcode) 





mnemonic opcode description 

(none) OF FFh Illegal instruction (reserved opcode) 
Privilege: Any level 

Registers Affected: none 

Flags Affected: none 


Exceptions Generated: Real, Virtual-8086 mode—Invalid opcode 
Protected mode—Invalid opcode 
Protected mode—Invalid opcode 





This opcode always generates an invalid opcode exception. The opcode will not be 
used in future AMD K86 processors. 
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Code Optimization for the 
AMD-K5 Processor 





This chapter provides information to assist fast execution and 
details on dispatch and execution timing for x86 instructions. 
Throughout the chapter, the terms clock and cycle refer to pro- 
cessor clock cycles, not bus clock (CLK) cycles. 


Code Optimization 





The code optimization suggestions in this section cover both 
general superscalar optimization (that is, techniques common 
to both the AMD-K5 and Pentium processors) and techniques 
specific to the AMD-KS5 processor. In general, all optimization 
techniques used for the Pentium processor apply to any wide- 
issue x86 processor, but wider-issue designs like the AMD-K5 
processor have fewer restrictions. 


General Superscalar Techniques 


ms Short Forms—Use shorter forms of instructions to increase 
the effective number of instructions that can be examined 
for decoding at any one time. Use 8-bit displacements and 
jump offsets where possible. 


ms Simple Instructions— Use simple instructions with hard- 
wired decode because they often perform more efficiently. 
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Moreover, future implementations may increase the penal- 
ties associated with microcoded instructions. 


ms Dependencies—Spread out true dependencies to increase 
the opportunities for parallel execution. Antidependencies 
and output dependencies do not impact performance. 


= Memory Operands—Instructions that operate on data in 
memory (load/op/store) can inhibit parallelism. Using sepa- 
rate move and ALU instructions allows independent opera- 
tions to be performed in parallel. On the other hand, if 
there are no opportunities for parallel execution, use the 
load/op/store forms to reduce the number of register spills 
(storing register values in memory to free registers for 
other uses) and increase code density. 


m Register Operands—Maintain frequently used values in reg- 
isters or on the stack rather than in static storage. 


ms Branch Prediction— Use control-flow constructs that allow 
effective branch prediction. Although correctly predicted 
branches have no cost, mispredicted branches incur a three 
clock penalty. 


m Stack References—Use ESP for references to the stack so 
that EBP remains available for general use. 


m Stack Allocation—When placing outgoing parameters on the 
stack, allocate space by adjusting the stack pointer (prefer- 
ably at the same time local storage is allocated on proce- 
dure entry) and use moves rather than pushes. This method 
of allocation allows random access to the outgoing parame- 
ters so that they may be set up when they are calculated, 
instead of having to be held somewhere else until the proce- 
dure call. This method also uses fewer execution resources 
(specifically, fewer register-file write ports when updating 
ESP). 


ms Shifts—Although there is only one shifter, certain shifts can 
be done using other execution units: for example, shift left 
1 by adding a value to itself. Use LEA index scaling to shift 
left by 1, 2, or 3. 

ms Data Embedded in Code—When data is embedded in the 
code segment, align it in separate cache blocks from nearby 
code to avoid some overhead in maintaining coherency 
between the instruction and data caches. 

ms Undefined Flags—Do not rely on the behavior of undefined 
flag results. 
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Loops—Unroll loops to get more parallelism and reduce 
loop overhead even with branch prediction. Inline small 
routines to avoid procedure-call overhead. In both cases, 
however, consider the cost of possible increased register 
usage, which might add load/store instructions for register 
spilling. 

Indexed Addressing—There is no penalty for base + index 
addressing in the AMD-KS5 processor. However, future 
implementations may have such a penalty to achieve a 
higher overall clock rate. 


Techniques Specific to the AMD-K5 Processor 


Jumps and Loops—JCXZ requires 1 cycle (correctly pre- 
dicted) and therefore is faster than a TEST/JZ, in contrast 
to the Pentium processor in which JCXZ requires 5 or 6 
cycles. All forms of LOOP take 2 cycles (correctly pre- 
dicted), which is also faster than the Pentium processor's 7 
or 8 cycles. 


Multiplies—Independent IMULs can be pipelined at one 
per cycle with 4-cycle latency, in contrast to the Pentium 
processor's serialized 9-cycle time. (MUL has the same 
latency, although the implicit AX usage of MUL prevents 
independent, parallel MUL operations.) 


Dispatch Conflicts—Load-balancing (that is, selecting 
instructions for parallel decode) is still important, but to a 
lesser extent than on the Pentium processor. In particular, 
arrange instructions to avoid execution-unit dispatching 
conflicts. (See page 43.) 


Instruction Prefixes—There is no penalty for instruction pre- 
fixes, including combinations such as segment-size and 
operand-size prefixes. This is particularly important for 16- 
bit code. However, future implementations may have penal- 
ties for the use of these prefixes. 


Byte Operations—For byte operations, the high and low 
bytes of AX, BX, CX, and DX are effectively independent 
registers that can be operated on in parallel. For example, 
reading AL does not have a dependency on an outstanding 
write to AH. 


Move and Convert—MOVZX, MOVSX, CBW, CWDE, CWD, 
CDQ all take 1 cycle (2 cycles for memory-based input), in 
contrast to the Pentium processor's 2 or 3 cycles. 
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= Bit Scan—BSF and BSR take 1 cycle (2 cycles for memory- 
based input), in contrast to the Pentium processor's data- 
dependent 6 to 34 cycles. 


s Bit Test—BT, BTS, BTR, and BTC take 1 cycle for register- 
based operands, and 2 or 3 cycles for memory-based oper- 
ands with immediate bit-offset, in contrast to the Pentium 
processor's 4 to 9 cycles. Register-based bit-offset forms on 
the AMD-KS processor take 5 cycles. If the semantics of the 
register-based bit-offset form are desired (where the bit off- 
set can cover a very large bit string in memory), it is better 
to emulate this with simpler instructions that can be inter- 
leaved with independent instructions for greater 
parallelism. 


s Floating-Point Top-of-Stack Bottleneck—The AMD-K5 proces- 
sor has a pipelined floating-point unit. Greater parallelism 
can be achieved by using FXCH in parallel with floating- 
point operations to alleviate the top-of-stack bottleneck, as 
in the Pentium processor. The AMD-K5 processor also per- 
mits integer operations (ALU, branch, load/store) in paral- 
lel with floating-point operations. 


ms Locating Branch Targets—Performance can be sensitive to 
code alignment, especially in tight loops. Locating branch 
targets to the first 17 bytes of the 32-byte cache line maxi- 
mizes the opportunity for parallel execution at the target. 
NOPs can be added to adjust this alignment. The AMD-K5 
processor executes NOPs (opcode 90h) at the rate of two per 
cycle. Adding NOPs is even more effective if they execute 
in parallel with existing code. Other instructions of greater 
length, such as a register-based TEST instruction, can be 
used as NOPs to minimize the overhead of such padding. 


ms Branch Prediction—There are two branch prediction bits in 
a 32-byte instruction cache line. One bit applies to the first 
16 bytes of the line and the second bit applies to the second 
16 bytes of the line. For effective branch prediction, code 
should be generated with one branch per 16-byte line half. 


m Address-Generation Interlocks (AGIs)—The AMD-KS5 proces- 
sor does not suffer from the single-cycle penalty that the 
486 and Pentium processors have when a result from execu- 
tion or from a data-cache access is used to form a cache 
address, so it is not necessary to avoid these situations. 
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Dispatch and Execution Timing 





This section documents functional unit usage for each instruc- 
tion, along with relative cycle numbers for dispatch and execu- 
tion of the associated ROPs for the instruction. 


Notation 


Table 2-1 contains the definitions for the integer instructions. 
Table 2-3 contains the definitions for the floating-point instruc- 
tions. The first column in these tables indicates the instruction 
mnemonic and operand types. The following notations are used 
in the AMD-K5 microprocessor documentation: 


reg—register 

mem—memory location 
imm—immediate value 
int_16—16-bit integer 

int_32—32-bit integer 

int_64— 64-bit integer 
real_32—32-bit floating-point number 
real_64— 64-bit floating-point number 


real_80—80-bit floating-point number 


If an operand refers to a specific register, the register name is 

used (e.g., AX, DX). When the register name is of the form Exx 
(e.g., EAX, ESI), the width of the register depends on the oper- 
and size attribute. 
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The second column contains an identifier with the following 
format: 


X_XX_XXXXXXXX_XXX_XXX 
= MODrm2:0] 
MODrn [5:3] 


Opcode 








Addressing Mode: 
Ox = register 
10 = memory without index 
1x = memory with or without index 
11 = memory with index 


1 =two-byte opcode (OF xx) 








The third column in the tables indicates whether the instruc- 
tion is Fastpath (F) or Microcoded (M). Fastpath and MROM 
ROPs cannot both be present in a decode stage at the same 
time. If a microcoded instruction appears at the head of the 
byte queue without having been present in the queue on the 
previous cycle, there is a one-cycle penalty for MROM entry 
point generation. 


Each x86 instruction is converted into one or more ROPs. The 
fourth column shows the execution unit and timing for each of 
the ROPs. The ROP types and corresponding execution units 


are: 
ms /d—load/store 

m st—load/store 

ms alu—either alu0 or alul 

gs alu0—alu0 only 

w alui—alul only 

ms brn—branch 

= fadd—floating-point add pipe 

s fmul—floating-point multiply pipe 

a fpmv—floating-point move and compare pipe 
= fpfill—floating-point upper half 
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The x/y value following the ROP type indicates the relative dis- 
patch and execution cycle of the opcode, in the absence of any 
conflicts. The format is: 


x/yL/z] 


where: 


m x= Dispatch Cycle—The relative cycle in which the ROP is 
dispatched from decode to the reservation station. 


mw y = Execution Cycle—The relative cycle in which the ROP is 
issued from the reservation station to the execution unit. 


m z= Result Cycle—The relative cycle in which the result is 
returned on the result bus. It is indicated only when the 
latency is greater than one cycle. For stores, it reflects the 
relative time that a store operand can be forwarded from 
the store buffer to a dependent load operation. 


Using the time that the first ROP of an instruction is dis- 
patched to an execution unit as clock 1, the x/y value indicates 
in which clock each ROP is dispatched and executed relative to 
clock 1. The execution order and timing does not necessarily 
match the dispatch order and timing. 


If any of the instructions read from or write to memory, it is 
assumed that the data exists in the cache. 
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Integer Instructions 


Table 2-1 shows the execution-unit usage for each integer 
instruction, along with relative cycle numbers for dispatch and 
execution of the associated ROPs for the instruction. 





Table 2-1. Integer Instructions 




















































































































. F Fastpath or Execution 
Instruction Mnemonic Opcode Format ANicrocodé Unit Timing 
ADD reg, reg 0 O0x_O000000xXx_xxXx_xXxXx F alu 1/1 
ADD reg, mem 0 1x_0000001x_xxx_xxx F id il 
alu 1/2 
Id 1/1 
ADD mem, reg 0_1x_0000000x_xxx_xxx F alu 1/2 
st 1/1/3 
ADD AL/AX/EAX, imm O_xx_0000010x_xxx_xxx F alu 1/1 
ADD reg, imm 0. 0x_100000xx_000_xxx F alu 1/1 
Id 1/1 
ADD mem, imm 0_1x_100000xx_000_xxx F alu 1/2 
st 1/1/3 
AND reg, reg 0. O0x_001000xx_xXxXx_xxx F alu 1/1 
AND reg, mem 0 1x_0010001x_xxx_xxx F id yt 
alu 1/2 
Id 1/1 
AND mem, reg 0_1x_0010000x_xxx_xxx F alu 1/2 
st 1/1/3 
AND AL/AX/EAX, imm O_xx_0010010x_xxx_xxx F alu 1/1 
AND reg, imm 0_0x_100000xx_100_xxx F alu 1/1 
Id 1/1 
AND mem, imm 0_1x_100000xx_100_xxx F alu 1/2 
st 1/1/3 
BSF reg, reg 1_0x_10111100_xxx_xxx F alul 1/1 
Id 1/1 
BSF reg, mem TA OTT LO xxx Ke F alut 2 
BSR reg, reg 1_0x_10111101_xxx_xxx F alul 1/1 
Id 1/1 
BSR reg, mem 1_1x_10111101_xxx_xxx F alu 2 
BSWAP reg 1_xx_11001xxx_xxx_xxx F alul 1/1 
BT reg, reg 1_0x_10100011_xxx_xxx F alul 1/1 
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Table 2-1. Integer Instructions (continued) 
























































: . Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcode Unit Timing 
alul 1/1 
alu 1/2 
BT mem, reg 1_1x_10100011_xxx_xxx M alu 2/3 
Id 2/4 
alul 3/5 
BT reg, imm 1_0x_10111010_100_xxx F alul 1/1 
BT mem, imm 1_1x_10111010_100_xxx F oe 
alul 1/2 
BIC reg, reg 1_0x_10111011_xxx_xxx F alul 1/1 
alul 1/1 
alu 1/2 
alu 2/3 
BIC mem, reg 1_1x_10111011_xxx_xxx M Id 2/4 
alul 3/5 
st 3/5/6 
BTC reg, imm LOX VOUT TOO 111 xxx F alul 1/1 
Id 1/1 
BTC mem, imm MAR AVOUT OVO xxx F alul 1/2 
st 1/1/3 
BIR reg, reg 1_0x_10110011_xxx_xxx F alul 1/1 
alul 1/1 
alu 1/2 
alu 2/3 
BIR mem, reg 1_1x_10110011_xxx_xxx M id 2/4 
alul 3/5 
st 3/5/6 
BTR reg, imm 1_0x_10111010_110_xxx F alul 1/1 
Id 1/1 
BIR mem, imm 1_1x_10111010_110_xxx F alul 1/2 
st 1/1/3 
BTS reg, reg 1_0x_10101011_xxx_xxx F alul 1/1 
alul 1/1 
alu 1/2 
alu 2/3 
BTS mem, reg 1_1x_10101011_xxx_xxx M id 2/4 
alul 3/5 
st 3/5/6 
BTS reg, imm 1_0x_10111010_101_xxx F alul 1/1 
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Table 2-1. Integer Instructions (continued) 





































































































‘ ‘ Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcode Unit Timing 
Id 1/1 
BTS mem, imm 1_1x_10111010_101_xxx F alul = 1/2 
st 1/1/3 
alu 1/1 
CALL near relative O_xx_11101000_xxx_xxx M a ie 
brn 1/1 
alu 1/1 
CALL near reg 0_0x_11111111_010_xxx M si 
brn 1/1 
alu 1/1 
Id 1/1 
CALL near mem O_1x_11111111_010_xxx M st 1/1/2 
alu 1/1 
brn 2/2 
CBW/DE O_xx_10011000_xxx_xxx F alul 1/1 
CMP reg, reg O_Ox_001110xXx_xxx_xxx F alu 1/1 
Id 1/1 
CMP reg, mem 0 1x_0011101x_xxx_xxx F ald 2 
CMP mem, reg 0_1x_00 OOxX_XXX_XXX F id yi 
; FS ecg ee ean alu 1/2 
CMP AL/AX/EAX, imm O_xx_0011110X_xxx_xxx F alu 1/1 
CMP reg, imm 0_Ox_100000xx_111_xxx F alu 1/1 
CMP mem, imm 0 _1x_100000xx_111_xxx F id yi 
alu 1/2 
CWD/DQ O_xx_10011001_xxx_xxx F alul 1/1 
DEC reg O0_xx_O1001XxXxX_xXxXX_XXX F alu 1/1 
DEC reg O_Ox_1111111x_001_xxx F alu 1/1 
Id 1/1 
DEC mem Q0_1x_1111111x_001_xxx F alu 1/2 
st 1/1/3 
fpfill = 1/1/4 
IMUL AX, AL, reg QO 0x_11110110_101_xxx F fail 1/4 
: fpfill = 1/1/4 
IMUL EDX:EAX, EAX, reg 0 0x_11110111_101_xxx F frail 1/4 
fpfill = 1/1/4 
IMUL reg, reg 1_0x_10101111_xxx_xxx F faiil 1/4 
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Table 2-1. Integer Instructions (continued) 


























































































































; . Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcode Unit Timing 
; fpfill 1/1/4 
IMUL reg, reg, imm O_Ox_011010x1_xxx_xxx F frail 1/4 
Id 1/1 
IMUL AX, AL, mem O0_1x_11110110_101_xxx F fpfill —1/2/4 
fmul —-1/2/4 
Id 1/1 
IMUL EDX:EAX, EAX, mem O_1x_11110111_101_xxx F fpfill 1/2/4 
fmul —-1/2/4 
Id 1/1 
IMUL reg, mem 1_1x_10101111_xxx_xxx F fpfill 1/2/4 
fmul —-1/2/4 
Id 1/1 
IMUL reg, reg, mem O_1x_011010x1_xxx_xxx F fpfill 1/2/4 
fmul =—-1/2/4 
INC reg O_xx_O01000xXxxX_xXXxX_XXxX F alu 1/1 
INC reg QO O0x_1111111x_000_xxx F alu 1/1 
Id 1/1 
INC mem Q_1x_1111111x_000_xxx F alu 1/2 
st 1/1/3 
Jcc short displacement QO xx_0111xXxxXxX_XXX_XXxX F brn 1/1 
Jcc long displacement 1_xx_1000xXxXxXx_XXX_XXX F brn 1/1 
JCXZ short displacement O_xx_11100011_xxx_xxx F brn 1/1 
JMP long displacement O_xx_11101001_xxx_xxx F brn 1/1 
JMP short displacement QO xx_11101011_xxx_xxx F brn 1/1 
JMP reg QO 0x_11111111_100_xxx F brn 1/1 
Id 1/1 
JMP mem ae lee ee eo F a 2 
LEA 0 _1x_10001101_xxx_xxx F Id 1/1 
LOOP short displacement O_xx_11100010_xxx_xxx F alu Lh 
brn 1/2 
: alu 1/1 
LOOPE short displacement QO xx_11100001_xxx_xxx M 
brn 1/2 
; alu 1/1 
LOOPNE short displacement O_xx_11100000_xxx_xxx M 
brn 1/2 
MOV reg, reg 0 0x_100010xx_xxx_xxx F alu 1/1 
MOV reg, mem O_1x_1000101x_xxx_xxx F Id 1/1 
MOV mem, reg 0_10_1000100x_xxx_xxx F st 1/1 
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Table 2-1. Integer Instructions (continued) 


























































































































. . Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcode Unit Timing 
MOV mem, reg Id 1/1 
Q0_11_1000100x_xxx_xxx F 
(base + index addressing) ae on st 1/2/3 
MOV AL/AX/EAX, mem O0_xx_1010000xX_xxx_xxx F Id 1/1 
MOV mem, AL/AX/EAX O_xx_1010001x_xxx_xxx F st 1/1 
MOV reg, imm Q0_Ox_1100011x_000_xxx F alu 1/1 
MOV reg, imm O_xx_1011xXxXxXxX_XXX_XxXxX F alu 1/1 
MOV mem, imm 0_10_1100011x_000_xxx F = | 
MOV mem, imm alu Wl 
0_11_1100011x_000_xxx F Id 1/1 
(base + index addressing) st 1/2/3 
MOVSX reg, reg 1_0x_1011111x_xxx_xxx F alul 1/1 
Id 1/1 
MOVSX reg, mem 1_1x_1011111x_xxx_xxx F nie 1/2 
MOVZX reg, reg 1_0x_1011011x_xxx_xxx F alu 1/1 
Id 1/1 
MOVZX reg, mem 1_1x_1011011x_xxx_xxx F Pe 1/2 
fpfill = 1/1/4 
MUL AX, AL, reg 0 0x_11110110_100_xxx F fail //4 
: fpfill  1/1/4 
MUL EDX:EAX, EAX, reg 0_O0x_11110111_100_xxx F frill 1/4 
Id 1/1 
MUL AX, AL, mem Q_1x_11110110_100_xxx F fpfill — 1/2/4 
fmul —-:1/2/4 
Id 1/1 
MUL EDX:EAX, EAX, mem Q_1x_11110111_100_xxx F fpfill — 1/2/4 
fmul —-1/2/4 
NEG reg Ox Ox OL Kx F alu 1/1 
Id 1/1 
NEG mem O_1x_1111011x_011_xxx F alu 1/2 
st 1/1/3 
NOP (XCHG EAX, EAX) O0_xx_10010000_xxx_xxx F alu 1/1 
NOT reg Ox VITO O10 xxx F alu 1/1 
Id 1/1 
NOT mem QO_1x_1111011x_010_xxx F alu 1/2 
st 1/1/3 
OR reg, reg 0_Ox_000010xXx_xxxX_xxx F alu 1/1 
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Table 2-1. Integer Instructions (continued) 











































































































. , Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcode Unit Timing 

OR reg, mem 0_1x_0000101x_xxx_xxx F dW 

alu 1/2 

Id 1/1 

OR mem, reg 0_1x_0000100x_xxx_xxx F alu 1/2 
st 1/1/3 

OR AL/AX/EAX, imm 0_xx_0000110x_xxx_xxx F alu 1/1 

OR reg, imm 0_Ox_100000xx_001_xxx F alu 1/1 

Id 1/1 

OR mem, imm 0_1x_100000xx_001_xxx F alu 1/2 
st 1/1/3 

Id 1/1 

POP reg O_xx_01011xXxxX_xXxXxX_xXxx F 
alu 1/1 
Id 1/1 
POP reg 0_Ox_10001111_000_xxx F 

alu 1/1 

Id 1/1 

POP mem 0_1x_10001111_000_xxx M on 
cali <= st 2/2/3 

alu 2/2 

PUSH reg 0_xx_O1010xXxXx_XxXX_XxXx F : is 
aioe aii alu 1/1/2 

PUSH reg 0_0x_11111111_110_xxx F at 
ileal iin alu 1/1/2 

alu 1/1 
PUSH imm O_xx_011010x0_xxx_xxx F st 1/1/2 

alu 1/1 

Id 1/1 
PUSH mem O_1x_11111111_110_xxx M st 1/1/2 

alu 1/1 

Id 1/1 

RET near O_xx_11000011_xxx_xxx F alu 1/1 

brn 1/2 

Id 1/1 

RET near imm O_xx_11000010_xxx_xxx M alu yt 

alu 1/2 

brn 1/2 

ROL reg, 1 0_Ox_1101000x_000_xxx F alul 1/1 

Id 1/1 

ROL mem, 1 O0_1x_1101000x_000_xxx F alul 1/2 
st 1/1/3 











Dispatch and Execution Timing 


51 


AMDdZ 
AMD-K5 Processor Software Development Guide 20007D/0—Sep1996 





Table 2-1. Integer Instructions (continued) 





























































































































. ‘ Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcode Unit Timing 

ROL reg, imm 0_Ox_1100000x_000_xxx F alul 1/1 

Id 1/1 

ROL mem, imm 0_1x_1100000x_000_xxx F alul = 1/2 
st 1/1/3 

ROL reg, CL 0_Ox_1101001x_000_xxx F alul 1/1 

Id 1/1 

ROL mem, CL 0_1x_1101001x_000_xxx F alul 1/2 
st 1/1/3 

ROR reg, 1 O0_Ox_1101000x_001_xxx F alul 1/1 

Id 1/1 

ROR mem, 1 0_1x_1101000x_001_xxx F alul = 1/2 
st 1/1/3 

ROR reg, imm 0_Ox_1100000x_001_xxx F alul 1/1 

Id 1/1 

ROR mem, imm 0_1x_1100000x_001_xxx F alul = 1/2 
st 1/1/3 

ROR reg, CL 0_Ox_1101001x_001_xxx F alul 1/1 

Id 1/1 

ROR mem, CL 0_1x_1101001x_001_xxx F alul = 1/2 
st 1/1/3 

SAR reg, 1 O0_Ox_1101000x_111_xxx F alul 1/1 

Id 1/1 

SAR mem, 1 0_1x_1101000x_111_xxx F alul 1/2 
st 1/1/3 

SAR reg, mem O_Ox_1100000x_111_xxx F alul 1/1 

Id 1/1 

SAR mem, imm 0_1x_1100000x_111_xxx F alul 1/2 
st 1/1/3 

SAR reg, CL O_Ox_1101001x_111_xxx F alul 1/1 

Id 1/1 

SAR mem, CL Q_1x_1101001x_111_xxx F alul = 1/2 
st 1/1/3 

SETcc reg hx MOOT KERR RK KE F brn 1/1 

brn 1/1 

SETcc mem 1_1x_1001xxxXx_xXXxX_XxXx F Id 1/1 
st 1/2/3 

SHL reg, 1 0_0x_1101000x_1x0_xxx F alul 1/1 


























52 Code Optimization for the AMD-K5 Processor 


AMD¢1 





20007D/0—Sep1996 


AMD-K5 Processor Software Development Guide 


Table 2-1. Integer Instructions (continued) 


























































































































. ‘ Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcode Unit Timing 

Id 1/1 

SHL mem, 1 0_1x_1101000x_1x0_xxx F alul 1/2 
st 1/1/3 

SHL reg, mem 0_Ox_1100000x_1x0_xxx F alul 1/1 

Id 1/1 

SHL mem, imm 0_1x_1100000x_1x0_xxx F alul 1/2 
st 1/1/3 

SHL reg, CL O0_Ox_1101001x_1x0_xxx F alul 1/1 

Id 1/1 

SHL mem, CL 0_1x_1101001x_1x0_xxx F alul 1/2 
st 1/1/3 

; alul 1/1 

SHLD reg, reg, imm 1_0x_10100100_xxx_xxx F alu 2/2 

alul 1/1 

. Id 1/1 

SHLD mem, reg, imm 1_1x_10100100_xxx_xxx M alut 2/2 
st 2/2/3 

alul 1/1 

SHLD reg, reg, CL Le Ox VOLOOLO1 xxx Xxx F aa 2/2 

alul 1/1 

SHLD mem, reg, CL 1_1x_10100101_xxx_xxx M id yl 

et —— — alul 2/2 
st 2/2/3 

SHR reg, 1 O0_Ox_1101000x_101_xxx F alul 1/1 

Id 1/1 

SHR mem, 1 QO_1x_1101000x_101_xxx F alul 1/2 
st 1/1/3 

SHR reg, mem 0_Ox_1100000x_101_xxx F alul 1/1 

Id 1/1 

SHR mem, imm O0_1x_1100000x_101_xxx F alul 1/2 
st 1/1/3 

SHR reg, CL Q0_Ox_1101001x_101_xxx F alul 1/1 

Id 1/1 

SHR mem, CL QO_1x_1101001x_101_xxx F alul 1/2 
st 1/1/3 

; alul 1/1 

SHRD reg, reg, imm 1_0x_10101100_xxx_xxx F alut 2/2 
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Table 2-1. Integer Instructions (continued) 






















































































. ‘ Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcode Unit Timing 
alul 1/1 
; Id 1/1 
SHRD mem, reg, imm 1_1x_10101100_xxx_xxx M alt 2/2 
st 2/2/3 
alul 1/1 
SHRD reg, reg, CL 1_0x_10101101_xxx_xxx F afal 2/2 
alul 1/1 
SHRD mem, reg, CL 1_1x_10101101_xxx_xxx M id ue 
7 v _ — 4 = = 4 alul 2/2 
st 2/2/3 
SUB reg, reg O0_Ox_001010xx_xxx_xxx F alu 1/1 
Id 1/1 
SUB reg, mem 0 _1x_0010101x_xxx_xxx F ali 2 
Id 1/1 
SUB mem, reg O0_1x_0010100x_xxx_xxx F alu 1/2 
st 1/1/3 
SUB AL/AX/EAX, imm O_xx_0010110x_xxx_xxx F alu 1/1 
SUB reg, imm 0_Ox_100000xx_101_xxx F alu 1/1 
Id 1/1 
SUB mem, imm 0_1x_100000xx_101_xxx F alu 1/2 
st 1/1/3 
TEST reg, reg 0 0x_1000010x_xxx_xxx F alu 1/1 
TEST mem, reg O0_1x_1000010x_xxx_xxx F i i 
alu 1/2 
TEST reg, imm 0 0x_1111011x_00x_xxx F alu 1/1 
TEST AL/AX/EAX, imm O_xx_1010100x_xxx_xxx F alu 1/1 
TEST mem, imm CO TK ILILOLILX 00K xxx F i ub 
alu 1/2 
alu 1/1 
XCHG EAX, reg (except EAX) O_xx_10010XxXxX_xXxXX_XXX F alu 1/1 
alu 2/2 
alu 1/1 
XCHG reg, reg 0 _0x_1000011x_xxx_xxx F alu 1/1 
alu 2/2 
Id 1/1 
XCHG mem, reg 0 _1x_1000011x_xxx_xxx F st 1/1/2 
alu 1/2 
XOR reg, reg 0 0x_001100xx_xxx_xxx F alu 1/1 
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Table 2-1. Integer Instructions (continued) 





























. , Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcode Unit Timing 
XOR reg, mem O0_1x_0011001xX_xxx_xxx F id yi 
alu 1/2 
Id 1/1 
XOR mem, reg O0_1x_0011000xX_xxx_xxx F alu 1/2 
st 1/1/3 
XOR AL/AX/EAX, imm 0_xx_0011010x_xxx_xxx F alu 1/1 
XOR reg, imm OO % TO0OU0 sx ex F alu 1/1 
Id 1/1 
XOR mem, imm O0_1x_100000xx_110_xxx F alu 1/2 
st 1/1/3 


























Integer Dot Product Example 


This example illustrates an optimal code sequence for an inte- 
ger dot product operation that performs multiply/accumulates 
(MACs) at the rate of one every 3 cycles. In this example, the 
array size is a constant. The loop is unrolled to perform sepa- 
rate MAC operations in parallel for even and odd elements. 
The final sum is generated outside the loop (as well as the final 
iteration for odd-sized arrays). 





























mac_loop 
MOV EAX, CLESTJLECX*4] sload A(i) 
MOV EBX, LESTIJLECX*4]+ :load A(i+1) 
IMUL EAX, CEDIJLECX*4] sACi) * Bi) 
IMUL EBX, [LEDIJ[ECX*4]+ sACit+1) * BCit1) 
ADD ECX, 2 ;increment index 
ADD EDX, EAX seven sum 
ADD EBP, EBX ;odd sum 
CMP ECX, EVEN_ARRAY_SIZE ;loop control 
JL ac_loop ; Jump 
:do final MAC here for odd-sized arrays 
ADD EDX, EBP ;final sum 





Table 2-2 shows the timing of internal operations from dis- 
patch to retire of each ROP for nearly two iterations of this 
loop. All memory accesses are assumed to hit in the cache. 
EVEN_ARRAY SIZE is set to 20. 
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Table 2-2. Integer Dot Product Internal Operations Timing 





Cycle 





Instruction 


MOV EAX,[ESI][ECX*4] L 
MOV EBX, [ESI][ECX*4]+4 L 








m| Vi VIN 


IMUL EAX,[EDI][ECX*4] 











IMUL EBX, [EDI][ECX*4]+4 





visivjz= 


ADD ECX,2 A 
ADD EDX,EAX a fx | 2 
ADD EBP,EBX = pe ee 
CMP ECX,20 il ‘ee l/s > 
JLLOOP eee: fee ee 
MOV EAX,[ESI][ECX*4] L 
MOV EBX, [ESI][ECX*4]+4 L 








>| >: 

















m—| Vi Vv 


IMUL EAX,[EDI][ECX*4] 











=lviz 


IMUL EAX, [EDI][ECX*4]+4 












































Notes: 
L— load execute 
M- multiply execute 
A-— ALU execute 
B— branch execute 
> result 
- retire (update real state) 
-— preceding execute: waiting in the reservation station 
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Floating-Point Instructions 


Floating-point ROPs are always dispatched in pairs to the FPU 
reservation station. The first ROP conveys the lower halves of 
the A and B operands, and it always has the fpfill ROP type. 
The second ROP conveys the upper halves of the operands, as 
well as the numeric opcode. Data from both ROPs is merged in 
the reservation station and must be converted into an internal 
floating-point format before it can be issued to the add pipe 
(fadd), multiply pipe (fmul), or detect pipe (fmv). It takes one 
cycle to perform the conversion, and this delay is incurred 
whenever the source of the data is the register file or one of 
the other functional units (e.g., load/store, ALU). If data is 
being forwarded from the FPU itself, however, no format con- 
version is required and operands are fast-forwarded from the 
back end of a pipe to the front of any other pipe without the 
one-cycle delay. 


The add/subtract/reverse FPU latencies assume that cancella- 
tion does not occur in the adder/subtractor. If cancellation 
does occur, an extra cycle is required to normalize the result. 


Table 2-3 shows the execution-unit usage for each floating- 
point instruction, along with relative cycle numbers for dis- 
patch and execution of the associated ROPs for the instruction. 





Table 2-3. Floating-Point Instructions 





















































. ‘ Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcoded Unit Timing 
fpfill — 1/2/4 
FABS 0 _0x_11011001_100_xxx F fan 12/4 
; , fpfill = 1/2/5 
FADD ST, ST(i) 0_0x_11011000_000_xxx F fadd 1/2/5 
; fpfill = 1/2/5 
FADD ST(i), ST 0_0x_11011000_000_xxx F fadd 1/2/5 
Id 1/1 
FADD real_32 Q_1x_11011000_000_xxx F fpfill 1/3/6 
fadd = 1/3/6 
Id 1/1 
FADD real_64 0_1x_11011100_000_xxx M ane 
- — — fpfill  1/4/7 
fadd =-1/4/7 
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Table 2-3. Floating-Point Instructions (continued) 






























































. , Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcoded Unit Timing 
: fpfill = 1/2/5 
FADDP ST(i), ST 0 0x_11011110_000_xxx F fadd 1/2/5 
fpfill  1/2/4 
FCHS 0 0x_11011001_100_xxx F fchs 1/2/4 
. fpfill  1/2/4 
FCOM ST(i) 0 0x_11011x00_010_xxx F fcmpst 1/2/4 
Id 1/1 
FCOM real_32 0 _1x_11011000_010_xxx F fpfill 1/3/5 
fmv 1/3/5 
Id 1/1 
FCOM real_64 0_1x_11011100_010_xxx M o 
- = = fpfill § 1/4/6 
fadd  1/4/6 
fpfill — 1/2/4 
FCOMP ST(i) O_0x 1101 Tx00.011_ xxx F fmv 1/2/4 
alu 1/1 
Id 1/1 
FCOMP real_32 0 _1x_11011000_011_xxx F fpfill 1/3/5 
fmv 1/3/5 
Id 1/1 
FCOMP real_64 0_1x_11011100_011_xxx M oie 
- i = fpfill  1/4/6 
fadd  1/4/6 
fpfill  1/2/4 
FCOMPP 0_0x_11011110_011_xxx F fmv 1/2/4 
nop 1/1/2 
FDECSTP 0_0x_11011001_110_xxx M au le 
alu 1/1/2 
Id 1/1 
fpfill — 1/3/7 
FIADD int_16 Q_1x_11011110_000_xxx M fadd = 1/3/7 
fpfill 2/7/10 
fadd 2/7/10 
Id 1/1 
fpfill  1/3/7 
FIADD int_32 Q_1x_11011010_000_xxx M fadd =-1/3/7 
fpfill 2/7/10 
fadd 2/7/10 
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Table 2-3. Floating-Point Instructions (continued) 









































Instruction Mnemonic Opcode Format ease aire 
Id 1/1 
fpfill § 1/3/7 
FICOM int_16 OLX TTT T1001 0 xxx M fadd = 1/3/7 
fpfill  2/7/9 
fmv 2/7/9 
Id 1/1 
fpfill  1/3/7 
FICOM int_32 0_1x_11011010_010_xxx M fadd = 1/3/7 
fpfill  2/7/9 
fmv 2/7/9 
Id 1/1 
fpfill  1/3/7 
FICOMP int_16 OLX _TIGTITII0 O11 xxx M fadd = 1/3/7 
fpfill  2/7/9 
fmv 2/7/9 
Id 1/1 
fpfill  1/3/7 
FICOMP int_32 0_1x_11011010_011_xxx M fadd = 1/3/7 
fpfill  2/7/9 
fmv 2/7/9 
Id 1/1 
FILD int_16 O_1x_11011111_000_xxx F fpfill 1/3/7 
fadd —1/3/7 
Id 1/1 
FILD int_32 0 _1x_11011011_000_xxx F fpfill 1/3/7 
fadd = 1/3/7 
Id 1/1 
FILD int_64 0_1x_11011111_101_xxx M Id = Wa 
- pes a fpfill — 1/4/8 
fadd = 1/4/8 
Id 1/1 
fpfill § 1/3/7 
FIMUL int_16 0_1x_11011110_001_xxx M fadd = 1/3/7 
fpfill = 2/7/11 
fmul =—-.2/7/11 
Id 1/1 
fpfill § 1/3/7 
FIMUL int_32 0_1x_11011010_001_xxx M fadd = 1/3/7 
fpfill = 2/7/11 
fmul = 2/7/11 
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Table 2-3. Floating-Point Instructions (continued) 





























. ‘ Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcoded Unit Timing 
Id 1/1 
; fpfill = 1/2/5 
FIST int_16 O_1x_11011111_010_xxx M fadd 1/2/5 
st 1/5/6 
Id 1/1 
: fpfill = 1/2/5 
FIST int_32 0_1x_11011011_010_xxx M fadd —1/2/5 
st 1/5/6 
Id 1/1 
: fpfill = 1/2/5 
FISTP int_16 O Ux TIGL Ort xxx M fadd —1/2/5 
st 1/5/6 
Id 1/1 
: fpfill = 1/2/5 
FISTP int_32 Q_1x_11011011_011_xxx M tadd 1/2/85 
st 1/5/6 
Id 1/1 
Id 1/2 
: fpfill = 1/2/5 
FISTP int_64 Oo TX TILE TL Tt xxx M fadd —1/2/5 
st 2/3/6 
st 2/4/7 
Id 1/1 
fpfill  1/3/7 
FISUB int_16 O_1x_T1011T10_100_xxx M fadd = 1/3/7 
fpfill 2/7/10 
fadd 2/7/10 
Id 1/1 
fpfill § 1/3/7 
FISUB int_32 O_1x_ 11011010100 xxx M fadd = 1/3/7 
fpfill 2/7/10 
fadd 2/7/10 
Id 1/1 
fpfill  1/3/7 
FISUBR int_16 Q0_1x_11011110_101_xxx M fadd = 1/3/7 
fpfill 2/7/10 
fadd 2/7/10 
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Table 2-3. Floating-Point Instructions (continued) 




















































































































. ‘ Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcoded Unit Timing 
Id 1/1 
fpfill § 1/3/7 
FISUBR int_32 0_1x_11011010_101_xxx M fadd = 1/3/7 
fpfill 2/7/10 
fadd 2/7/10 
Id 1/1 
FLD real_32 Q_1x_11011001_000_xxx F fpfill 1/3/5 
fmv 1/3/5 
Id 1/1 
FLD real_64 0_1x_11011101_000_xxx M ei le 
i ae ion fpfill  1/4/6 
fmv 1/4/6 
Id 1/1 
FLD real_80 0_1x_11011011_101_xxx M a 42 
= =a a fpfill  1/6/8 
fmv 1/6/8 
fpfill  1/2/4 
FLD ST(i) 0 0x_11011001_000_xxx F fmv 1/2/4 
nop 1/1 
: fpfill  1/2/8 
FMUL ST, ST(i) 0_0x_11011000_001_xxx F fmul —1/2/8 
: fpfill  § 1/2/8 
FMUL ST(i), ST O0_Ox_11011100_001_xxx F fmul —1/2/8 
Id 1/1 
FMUL real_32 O0_1x_11011000_001_xxx F fpfill 1/3/7 
fmul —-1/3/7 
Id 1/1 
FMUL real_64 0_1x_11011100_001_xxx M a de 
= = a fpfill 1/4/10 
fmul —-1/4/10 
: fpfill § 1/2/8 
FMULP ST, ST(i) O_Ox_11011110_001_xxx F fmul —1/2/8 
; fpfill 1/2/8 
FMULP ST(i), ST 0_O0x_11011110_001_xxx F fmul —1/2/8 
alu 1/1/2 
FNOP Q0_Ox_11011001_010_xxx F ale 2 
fpfill  1/2/9 
FRNDINT 0 OX T1011 0011 1 xxx F fadd —1/2/9 
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Table 2-3. Floating-Point Instructions (continued) 






















































































; ; Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcoded Unit Timing 
fpfill  1/2/8 
FSCALE O_0x_11011001_111_xxx F fadd 1/2/8 
Id 1/1 
fpfill — 1/2/4 
FST real_32 O 1% JIL 1001 010 xxx M fan 1/2/4 
st 1/2/5 
fpfill — 1/2/4 
FST ST(i) 0_0x_11011101_010_xxx F fmv 1/2/4 
Id 1/1 
fpfill — 1/2/4 
FSTP real_32 Ge TUG LIOG1 Ur ex M fay 1/2/4 
st 1/2/5 
Id 1/1 
Id 1/2 
fpfill — 1/2/4 
FSTP real_64 GT TL I1O1 Ot xx M fra 1/2/4 
st 2/3/5 
st 2/4/6 
Id 1/1 
Id 1/2 
fpfill — 1/2/4 
FSTP real_80 OTK AT01L10TL 11) xxx M fry 1/2/4 
st 2/3/5 
st 2/4/6 
fpfill —1/2/4 
FSTP ST(i) Q_Ox_11011x01_011_xxx F fmv —-1/2/4 
fpfill 1/2/5 
FSUB ST, ST(i) 0 0x_11011000_100_xxx F fadd —1/2/5 
fpfill —1/2/5 
FSUB ST(i), ST 0_O0x_11011100_100_xxx F fadd —1/2/5 
Id 1/1 
FSUB real_32 0_1x_11011000_100_xxx F fpfill —1/3/6 
fadd _—1/3/6 
Id 1/1 
FSUB real_64 0 _1x_11011100_100_xxx M c i 
i _1x_ _100_ fpfill —1/4/7 
fadd 1/4/7 
fpfill — 1/2/5 
FSUBP ST(i), ST 0 _0x_11011110_100_xxx F fadd 1/2/5 
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Table 2-3. Floating-Point Instructions (continued) 









































































































































. , Fastpath or Execution 
Instruction Mnemonic Opcode Format Microcoded Unit Timing 
‘ fpfill = 1/2/5 
FSUBR ST, ST(1) 0_Ox_11011000_101_xxx F fadd —1/2/5 
: fpfill = 1/2/5 
FSUBR ST(i), ST 0 0x_11011100_101_xxx F fadd 1/2/5 
Id 1/1 
FSUBR real_32 O_1x_11011000_101_xxx F fpfill 1/3/6 
fadd = 1/3/6 
Id 1/1 
FSUBR real_64 0_1x_11011100_101_xxx M a 12 
= ss _ fpfill  1/4/7 
fadd = 1/4/7 
; fpfill = 1/2/5 
FSUBRP ST(i), ST 0_O0x_11011110_101_xxx F fadd —1/2/5 
fpfill  1/2/4 
FIST 0_0x_11011001_100_xxx F Ga 12/4 
. fpfill  1/2/4 
FUCOM ST(I) 0_0x_11011101_100_xxx F fat 1/2/4 
fpfill — 1/2/4 
FUCOMP ST(i) OOx_1TO111T01_ 101 xxx F fmv 1/2/4 
nop 1/1 
fpfill 1/2/4 
FUCOMPP 0_O0x_11011010_101_xxx F fmv 1/2/4 
nop 1/1 
FWAIT 0_xx_10011011_xxx_xxx F alu 1/1 
fpfill  1/2/4 
FXAM O0_Ox_11011001_100_xxx F ai 12/4 
FXCH ST(1) 0_Ox_11011001_001_xxx F brn 1/1 
fpfill  1/2/4 
fmv 1/2/4 
fpfill = 2/3/11 
FXTRACT 0_0x_11011001_110_xxx M fadd 2/3/11 
fpfill  3/4/6 
fmv 3/4/6 
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AMD-KS5 Processor 
Initialization 





The internal state of the AMD-K5 processor can be initialized 
to known values via either the RESET or INIT signal. RESET 
takes effect immediately, asynchronously to whatever the pro- 
cessor may be doing. INIT is recognized only at the next 
instruction boundary after assertion. RESET provides a com- 
plete initialization, whereas INIT provides only a subset of 
this. Specifically, INIT does not affect the numeric coprocessor 
state or the cache contents. The initialized internal state is 
described in the following paragraphs. Except where explicitly 
noted, the resulting state is the same for both RESET and 
INIT. 


General Registers 





All general registers except EAX and EDX are cleared. EDX is 
loaded with the processor ID value. This is the value returned 
by issuing the CPUID instruction with a 1 in EAX (see 
“CPUID” on page 29). EAX is normally cleared, although if 
BIST is run along with reset and an error is detected, EAX will 
be loaded with a BIST error code. 
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Segment Registers 





The selector portion of all segment registers is cleared. The 
access rights and attribute fields are set up as shown in Table 




















3-1. 
Table 3-1. Segment Register Attribute Fields Initial Values 
Attribute Field Value Description 
G 0 Byte granularity 
D/B 0 16-bit 
P 1 Present 
DPL 0 Privilege level 
S 1 Application segment (except LDTR) 
Type 2 Data, read/write 

















The limit fields are set to FFFFh. For CS, the base address is 
set to FFFF_0000h; for all others the base address is 0. Note 
that IDTR and GDTR consist of the just base and limit values, 
which are initialized to 0 and FFFFh, respectively. 


EIP and EFLAGS 





All bits of EFLAGS are cleared, with the exception of bit 1, 
which is hardwired to a 1. EIP is set to 0000_FFFOh. 


Control and Debug Registers 





On RESET, CRO is initialized to 0600_0010h; the NW and CD 
bits are set to disable the caches. On INIT, the NW and CD bits 
retain their prior state. Note that the ET bit is always set. CR2, 
CR3, and CR4 are cleared. Debug registers 0-3 are cleared. 
DR6 is set to FFFF_OFFO0h, and DR7 is set to 0000_0400h (bit 
10 is hardwired to a 1). 
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Model-Specific Registers 





The HWCR (Hardware Configuration Register) is cleared. On 
RESET, the TSC (Time Stamp Counter) is cleared, although it 
starts incrementing some clocks before the first instruction is 
fetched. INIT does not affect the TSC. 


Caches and TLB 





All TLB entries are invalidated; all cache Tag Valid bits are 
cleared on RESET. All other cache contents are undefined. On 
INIT, the Tag Valid bits, as well as all other cache contents, 
retain their prior state. 


Floating-Point Unit 





The state of the FPU is initialized by RESET only; it is unaf- 
fected by INIT. On RESET, the FP instruction address, data 
address, opcode, Status Word, and Control Word are all 
cleared (note that FP Control Word bit 6 is hardwired to 1). 
The FP Tag Word is set to 5555h. All entries in the FP stack are 
initialized to 0. 
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AMD-KS5 Processor Test and 
Debug 





The AMD-KS5 processor has the following modes in which pro- 
cessor and system operation can be tested or debugged: 


Hardware Configuration Register (HWCR)—The HWCR is a 
model-specific register that contains configuration bits that 
enable cache, branch tracing, debug, and clock control 
functions. 


Built-In Self-Test (BIST)—Both normal and test access port 
(TAP) BIST. 


Output-Float Test—A test mode that causes the AMD-K5 
processor to float all of its output and bidirectional signals. 


Cache and TLB Testing—The Array Access Register (AAR) 
supports writes and reads to any location in the tag and 
data arrays of the processor’s on-chip caches and TLBs. 


Debug Registers—Standard 486 debug functions, with an I/O- 
breakpoint extension. 


Branch Tracing—A pair of special bus cycles can be driven 
immediately after taken branches to specify information 
about the branch instruction and its target. The Hardware 
Configuration Register (HWCR) provides support for this 
and other debug functions. 


Functional Redundancy Checking—Support for real-time 
testing that uses two processors in a master-checker 
relationship. 
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m Test Access Port (TAP) Boundary-Scan Testing—The JTAG 
test access functions defined by the IEEE Standard Test 
Access Port and Boundary-Scan Architecture (IEEE 1149.1- 
1990) specification. 


ms Hardware Debug Tool (HDT)—The hardware debug tool 
(HDT), sometimes referred to as the debug port or Probe 
mode, is a collection of signals, registers, and processor 
microcode that is enabled when external debug logic drives 
R/S Low or loads the AMD-K5 processor’s Test Access Port 
(TAP) instruction register with the USEHDT instruction. 


The test-related signals are described in Chapter 5 of the 
AMD-K5 Processor Technical Reference Manual. The signals 
include the following: 


FLUSH 
FRCMC 
TERR 
INIT 
PRDY 
R/S 
RESET 
TCK 
TDI 
TDO 
TMS 
TRST 


The sections that follow provide details on each of the test and 
debug features. 
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Hardware Configuration Register (HWCR) 





The Hardware Configuration Register (HWCR) is a model- 
specific register (MSR) that contains configuration bits that 
enable cache, branch tracing, debug, and clock control func- 
tions. The WRMSR and RDMSR instructions access the HWCR 
when the ECX register contains the value 83h, as described on 
page 34. Figure 4-1 and Table 4-1 show the format and fields of 
the HWCR. 




















—» Reserved 
Disable Data Cache DDC 


7 
Disable Instruction Cache DIC 6 
Disable Branch Prediction DBP 5 
Debug Control DC 3-1 
000 «Off 


001 Enable branch trace usages 
100 Activate Probe mode on debug trap 


Disable Stopping Processor Clocks © DSPC 0 


Figure 4-1. Hardware Configuration Register (HWCR) 
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Table 4-1. Hardware Configuration Register (HWCR) Fields 




















Bit Mnemonic Description Function 
31-8 - - reserved 
; Disables data cache. 
7 DDC Disable Data Cache ; 
0 = enabled, 1 = disabled. 
; ; Disables instruction cache. 
6 DIC Disable Instruction Cache ; 
0 =enabled, 1 = disabled. 
: _ Disables branch prediction. 
5 DBP Disable Branch Prediction ; 
0 = enabled, 1 = disabled. 
4 = - reserved 
Debug control bits: 
000 Off (disable HWCR debug control). 
001 Enable branch-tracing messages. See “Branch 
Tracing” on page 85. 
010 reserved 
3-1 DC Debug Control 


011 reserved 
100 HDT trap 
101 reserved 
110 reserved 


111 reserved 





; : Disables stopping of internal processor clocks in the 
0 DSPC Disable Stopping Halt and Stop Grant states. 


Processor Clocks ; 
0 = enabled, 1 = disabled. 











Notes: 
Documentation on the Hardware Debug Tool (HDT) ts available from AMD under a nondisclosure agreement. 
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Built-In Self-Test (BIST) 





Normal BIST 


The processor supports the following types of built-in self-test: 
a Normal BIST—A built-in self-test mode typically used to 
test system functions after RESET 


m Test Access Port (TAP) BIST—A self-test mode started by the 
TAP instruction, RUNBIST 


All internal arrays except the TLB are tested in parallel by 
hardware. The TLB is tested by microcode. Unlike the Pentium 
processor, the AMD-K5 processor does not report parity errors 
on IERR for every cache or TLB access. Instead, the AMD-K5 
processor fully tests its caches during the BIST. EADS should 
not be asserted during a BIST. The processor accesses the phys- 
ical tag array during BISTs, and these accesses can conflict 
with inquire cycles. 


The normal BIST is invoked if INIT is asserted at the falling 
edge of RESET. The BIST runs tests on the internal hardware 
that exercise the following resources: 


m Instruction cache: 

¢ Linear tag directory 

¢ Instruction array 

¢ Physical tag directory 
m Data cache: 

¢ Linear tag directory 

¢ Data array 

e Physical tag directory 
ms Entry-point and instruction-decode PLAs 

Microcode ROM 
= TLB 


The BIST runs a linear feedback shift register (LFSR) signa- 
ture test on the microcode ROM in parallel with a March C test 
on the instruction cache, data cache, and physical tags. This is 
followed by the March C test on the TLB arrays and then an 
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LFSR signature test on the PLA, in that order. Upon comple- 
tion of the PLA test, the processor transfers the test result 
from an internal Hardware Debug Test (HDT) data register to 
the EAX register for external access, resets the internal micro- 
code, and begins normal code fetching. 


The result of the BIST can be accessed by reading the lower 9 
bits of the EAX register. If the EAX register value is 
0000_0000h, the test completed successfully. If the value is not 
zero, the non-zero bits indicate where the failure occurred, as 
shown in Table 4-2. The processor continues with its normal 
boot process after the BIST completes, whether the BIST 
passed or failed. 


Table 4-2. BIST Error Bit Definition in EAX Register 



































Bit Value 
Bit Number 
0 1 

31-9 No Error | Always 0 
8 No Error Data path 
7 No Error Instruction-cache instructions 
6 No Error Instruction-cache linear tags 
5 No Error Data-cache linear tags 
4 No Error PLA 
3 No Error Microcode ROM 
2 No Error Data-cache data 
1 No Error Instruction cache physical tags 
0 No Error Data-cache physical tags 

















Test Access Port (TAP) BIST 


The TAP BIST performs all of the functions of the normal 
BIST, up to and including the PLA signature test, in the exact 
manner as the normal BIST. However, after the PLA test, the 
test result is not transferred to the EAX register. 


The TAP BIST is started by loading and executing the RUN- 
BIST instruction in the test access port, as described in 
“Boundary Scan Architecture Support” on page 87. When the 
RUNBIST instruction is executed, the processor enters into a 
reset mode that is identical to that entered when the RESET 
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signal is asserted. Upon completion of the TAP BIST, the result 
remains in the BIST result register for shifting out through the 
TDO signal. The TRST signal must be asserted or the TAP 
instruction must be changed in order to exit TAP BIST and 
return to normal operation. 


Output-Float Test 





The Output-Float Test mode is entered if FLUSH is asserted 
before the falling edge of RESET. This causes the processor to 
place all of its output and bidirectional signals in the high- 
impedance state. In this isolated state, system board traces and 
connections can be tested for integrity and driveability. The 
Output-Float Test mode can only be exited by asserting RESET 
again. 


On the AMD-KS5 and Pentium processors, FLUSH is an edge- 
triggered interrupt. On the 486 processor, however, the signal 
is a level-sensitive input. 


Cache and TLB Testing 





Cache and TLB testing is often done by the BIOS or operating 
system during power-up. These arrays can be tested using the 
Array Access Register (AAR). The following tests can be 
performed: 


ms Data Cache—8-Kbyte, 4-way, set associative 
« Data array 
¢ Linear-tag array 
¢ Physical-tag array 
m Instruction Cache—16-Kbyte, 4-way, set associative 
¢ Instruction array 
¢ Linear-tag array 
¢ Physical-tag array 
¢ Valid-bit array 
¢ Branch-prediction bit array 
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ms 4-Kbyte TLB—128-entry, 4-way, set associative 
¢ Linear-tag array 
¢ Page array 
ms 4-Mbyte TLB—4-entry, fully associative 
¢ Linear-tag array 
¢ Page array 
Note: For more information on cache arrays, see Appendix A. 


Array Access Register (AAR) 


The 64-bit Array Access Register (AAR) is a model-specific 
register (MSR) that contains a 32-bit array pointer, which iden- 
tifies the array location to be tested, and 32 bits of array test 
data to be read or written. The WRMSR and RDMSR instruc- 
tions access the AAR when the ECX register contains the value 
82h, as described on page 34. Figure 4-2 shows the format of 
the AAR. 








MSR 
3] 0 82h 


Array Data 
(Contents of EAX) 





Figure 4-2. Array Access Register (AAR) 





To read or write an array location, perform the following steps: 
1. ECX—Enter 82h into ECX to access the 64-bit AAR. 


2. EDX—Enter a 32-bit array pointer into EDX, as shown in 
Figures 4-3 through 4-8 (top). 


3. EAX—Read or write 32 bits of array test data to or from 
EAX, as shown in Figures 4-3 through 4-8 (bottom). 
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The array pointers entered in EDX (Figures 4-3 through 4-8, 
top) specify particular array locations. For example, in the 
data- and instruction-cache arrays, the way (or column) and set 
(or index) in the array pointer specifies a cache line in the 
4-way, set-associative array. The array pointers for data-cache 
data and instruction-cache instructions also specify a dword 
location within that cache line. In the data cache, this dword is 
32 bits of data, in the instruction cache, this dword is two 
instruction bytes plus their associated pre-decode bits. For the 
4-Kbyte TLB, the way and set specify one of the 128 TLB 
entries. In 4-Mbyte TLB, one of only four entries is specified. 


Bits 7-0 of every array pointer encode the array ID, which iden- 
tifies the array to be accessed, as shown in Table 4-3. To sim- 
plify multiple accesses to an array, the contents of EDX are 
retained after the RDMSR instruction executes (EDX is nor- 
mally cleared after a RDMSR instruction). 


Table 4-3. Array IDs in Array Pointers 


















































satis aa Accessed Array 
EOh Data Cache: Data 
Eth Data Cache: Linear Tag 
ECh Data Cache: Physical Tag 
E4h Instruction Cache: Instructions 
E5h Instruction Cache: Linear Tag 
EDh Instruction Cache: Physical Tag 
E6h Instruction Cache: Valid Bits 
E7h Instruction Cache: Branch-Prediction Bits 
E8h 4-Kbyte TLB: Page 
E9h 4-Kbyte TLB: Linear Tag 
EAh 4-Mbyte TLB: Page 
EBh 4-Mbyte TLB: Linear Tag 
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Array Test Data 


EAX specifies the test data to be read or written with the 
RDMSR or WRMSR instruction (see Figures 4-3 through 4-8). 
For example, in Figure 4-3 (top) the array pointer in EDX spec- 
ifies a way and set within the data-cache linear tag array (E1h 
in bits 7-0 of the array pointer) or the physical tag array (ECh 
in bits 7-0 of the array pointer). If the linear tag array (E1h) is 
accessed, the data read or written includes the tag and the sta- 
tus bits. The details of the valid fields in EAX are shown in 
Appendix A. 





EDX: Array Pointer 


31 30 29 28 27 19 18 13 12 


Array ID 





EAX: Test Data 


31 28 27 0 





(Eth) Linear Tag 


31 23 22 0 





(ECh) Physical Tag 


Figure 4-3. Test Formats: Data-Cache Tags 
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EDX: Array Pointer 


31 30 29 28 27 19 18 13 12 10 9 8 7 0 


Array ID 





EAX: Test Data 


3] 0 


Valid Bits 


(EOh) Data 


Figure 4-4. Test Formats: Data-Cache Data 
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EDX: Array Pointer 


31 30 29 28 27 20 19 


Way (CROSS C0 C0 00 


20007D/0—Sep1996 


Array ID 
(E5h, EDh, E6h, E7h) 





EAX: Test Data 


ORO TORO TORO MO RO ROO ROMO 


Valid Bits 





(E5h) Linear Tag 


3] 21 20 


OODDOKDHBOBHOEL 


Valid Bits 





(EDh) Physical Tag 


31 19 18 


0 





(E6h) Valid Bits 


31 19 18 


0 





(E7h) Branch-Prediction Bits 


Figure 4-5. Test Formats: Instruction-Cache Tags 
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EDX: Array Pointer 


31 30 29 28 27 20 19 12 11 9 8 7 0 


A ID 
wy! 00000000 pede He 


Bytes (E4h) 





EAX: Test Data 


31 26 25 0 


Valid Bits 





(E4h) Instruction Bytes 


Figure 4-6. Test Formats: Instruction-Cache Instructions 
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EDX: Array Pointer 


31 30 29 28 27 13 12 


CVO NOHOKOHOHOHOHHHUOE 


20007D/0—Sep1996 


Array ID 
(E8h, E9h) 





EAX: Test Data 


31 22 21 


GOVNOVUWBUHHOE Valid Bits 





(E8h) 4-Kbyte Page and Status 


31 20 19 


000000000008 


Valid Bits 





(E9h) 4-Kbyte Linear Tag 


Figure 4-7. Test Formats: 4-Kbyte TLB 
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EDX: Array Pointer 


31 30 29 28 27 8 7 0 


Array ID 
(EAh, EBh) 


QUVUODOOUVHOUVHOVWVHOVHHHOHKHOHOE 





EAX: Test Data 


31 12 11 0 


GCUOOBOKBHMOHHOHHOHOHKBHHHOOE Valid Bits 





(EAh) 4-Mbyte Page and Status 


CKOKOOKDHOKDOKHDHOKDODHOOOE Valid Bits 





(EBh) 4-Mbyte Linear Tag 


Figure 4-8. Test Formats: 4-Mbyte TLB 
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Debug Registers 





The processor implements the standard debug functions and 
registers—DR7-DR6 and DR3-DR0O (often called DR7—DR0)— 
that are available on the 486 processor, plus an I/O breakpoint 
extension. 


Standard Debug Functions 


The debug functions make the processor’s state visible to 
debug software through four debug registers (DR3-DRO) that 
are accessed by MOV instructions. Accesses to memory 
addresses can be set as breakpoints in the instruction flow by 
invoking one of two debug exceptions (interrupt vectors 1 or 3) 
during instruction or data accesses to the addresses. The debug 
functions eliminate the need to embed breakpoints in code and 
allow debugging of ROM as well as RAM. 


For details on the standard 486 debug functions and registers, 
see the AMD documentation on the Am486® processor or other 
commercial x86 literature. 


I/O Breakpoint Extension 


The processor supports an I/O breakpoint extension for break- 
points on I/O reads and writes. This function is enabled by set- 
ting bit 3 of CR4, as described in “Control Register 4 (CR4) 
Extensions” on page 2. When enabled, the I/O breakpoint func- 
tion is invoked by the following: 


= Entering the I/O port number as a breakpoint address (zero- 
extended to 32 bits) in one of the breakpoint registers, 
DR3-DRO 


= Entering the bit pattern, 10b, in the corresponding 2-bit 
R/W field in DR7 


All data breakpoints on the AMD-K5 processor are precise, 
including those encountered in repeated string operations, 
which trap after completing the iteration on which the break- 
point match occurs. 
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Enabled breakpoints slow the processor somewhat. When a 
data breakpoint is enabled, the processor disables its dual- 
issue load/store operations and performs only single-issue load/ 
store operations. When an instruction breakpoint is enabled, 
instruction issue is completely serialized. 


Debug Compatibility with Pentium Processor 


The differences in debug functions between the AMD-K5 and 
Pentium processors are described in Appendix A of the 
AMD-K5 Processor Technical Reference Manual, order# 18524. 


Branch Tracing 





Branch tracing is enabled by writing bits 3-1 with 001b and set- 
ting bit 5 to 1 (disabling branch prediction) in the Hardware 
Configuration Register (HWCR), as described on page 71. 
When thus enabled, the processor drives two branch-trace mes- 
sage special bus cycles immediately after each taken branch 
instruction is executed. Both special bus cycles have a BE7— 
BEO encoding of DFh (1101_1111b). The first special bus cycle 
identifies the branch source, the second identifies the branch 
target. The contents of the address and data bus during these 
special bus cycles are shown in Table 4-4. 


The branch-trace message special bus cycles are different for 
the AMD-KS5 and Pentium processors, although their BE7-BEO 
encodings are the same. 
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Table 4-4. Branch-Trace Message Special Bus Cycle Fields 
































Signals First Special Bus Cycle Second Special Bus Cycle 
A31 0 = first special bus cycle (source) 1 =second special bus cycle (target) 
Operating Mode of Target: 
11 = Virtual-8086 Mode 
A30-A29 not valid 10 = Protected Mode 
01 =not valid 
00 = Real Mode 
Default Operand Size of Target Segment: 
A28 not valid 1 =32-bit 
0 = 16-bit 
A27-A20 0 0 
A19-A4 ee KGS) seaectar OF Braticn Code Segment (CS) selector of Branch Target. 
A3 0 0 
D31-DO EIP of Branch Source. EIP of Branch Target. 








Functional-Redundancy Checking 





When FRCMC is asserted at RESET, the processor enters 
Functional-Redundancy Checking mode, as the checker, and 
reports checking errors on the IERR output. If FRCMC is 
negated at RESET, the processor operates normally, although 
it also behaves as the master in a functional-redundancy check- 
ing arrangement with a checker. 


In the Functional-Redundancy Checking mode, two processors 
have their signals tied together. One processor (the master) 
operates normally. The other processor (the checker) has its 
output and bidirectional signals (except for TDO and IERR) 
floated to detect the state of the master’s signals. The master 
controls instruction fetching and the checker mimics its behav- 
ior by sampling the fetched instructions as they appear on the 
bus. Both processors execute the instructions in lock step. The 
checker compares the state of the master’s output and bidirec- 
tional signals with the state that the checker itself would have 
driven for the same instruction stream. 
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Errors detected by the checker are reported on the IERR out- 
put of the checker. If a mismatch occurs on such a comparison, 
the checker asserts IERR for one clock, two clocks after the 
detection of the error. Both the master and the checker con- 
tinue running the checking program after an error occurs. No 
action other than the assertion of IERR is taken by the proces- 
sor. On the AMD-K5 processor, the IERR output is reserved 
solely for functional-redundancy checking. No other errors are 
reported on that output. 


Functional-redundancy checking is typically implemented on 
single-processor, fault-monitoring systems (which actually 
have two processors). The master processor runs the opera- 
tional programs and the checker processor is dedicated 
entirely to constant checking. In this arrangement, the test of 
accurate operation consists solely of reporting one or more 
errors. The particular type of error or the instruction causing 
an error is not reported. The arrangement works because the 
processor is entirely deterministic. Speculative prefetching, 
speculative execution, and cache replacement all occur in 
identical ways and at identical times on both processors if their 
signals are tied together so that they run the same program. 


The Functional-Redundancy Checking mode can only be 
exited by the assertion of RESET. Functional-redundancy 
checking cannot be performed in the Hardware Debug Tool 
(HDT) mode. The assertion of FRCMC is not recognized while 
PRDY is asserted. 


Boundary Scan Architecture Support 





The AMD-KS processor provides test features compatible with 
the Standard Test Access Port (TAP) and Boundary Scan Test 
Architecture as defined in the IEEE 1149.1-1990 JTAG Specifi- 
cation. The subsections in this topic include: 


Boundary Scan Test Functional Description 
Boundary Scan Architecture 

Registers 

The Test Access Port (TAP) Controller 
JTAG Register Organization 
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ms JTAG Instructions 


The external TAP interface consists of five pins: 


= TCK: The Test Clock input provides the clock for the JTAG 
test logic. 


= TMS: The Test Mode Select input enables TAP controller 
operations. 


TDI: The Test Data Input provides serial input to registers. 


= TDO: The Test Data Output provides serial output from the 
registers; the signal is tri-stated except when in the Shift- 
DR or Shift-IR controller states. 


= TRST: The TAP Controller Reset input initializes the TAP 
controller when asserted Low. 


The internal JTAG logic contains the elements listed below: 


m The Test Access Port (TAP) Controller—Decodes the inputs 
on the Test Mode Select (TMS) line to control test opera- 
tions. The TAP is a general-purpose port that provides 
access to the test support functions built into the AMD-K5 
processor. 


= Instruction Register—Accepts instructions from the Test 
Data Input (TDI) pin. The instruction codes select the spe- 
cific test or debug operation to be performed or the test 
data register to be accessed. 


= Implemented Test Data Registers—Boundary Scan Regis- 
ter, Device Identification Register, and Bypass Register. 
See “JTAG Register Organization” on page 91 for more 
information. 


Note: See Table 4-8 for more information. 


Boundary Scan Test Functional Description 


The boundary scan testing uses a shift register, contained in a 
boundary scan cell, located between the core logic and the I/O 
buffers adjacent to each component pin. Signals at each input 
and output pin are controlled and observed using scan testing 
techniques. The boundary scan cells are interconnected to 
form a shift register chain. This register chain, called a Bound- 
ary Scan Register (BSR), constructs a serial path surrounding 
the core logic. This enables test data to be shifted through the 
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boundary scan path. When the system enters the Boundary 
Scan Test mode, the BSR chain is directed by a test program to 
pass data along the shift register path. 


If all the components used to construct a circuit or PCB contain 
a boundary scan cell architecture, the resulting serial path can 
be used to perform component interconnect testing. 


Boundary Scan Architecture 


Boundary Scan architecture has four basic elements: 


Test Access Port (TAP) 
TAP Controller 


m Instruction Register (IR). See“Instruction Register” on 
page 90 for more information. 


m Test Data Registers. See “Registers” on page 90 for more 
information. 


The Instruction and Test Data Registers have separate shift 
register access paths connected in parallel between the Test 
Data In (TDI) and Test Data Out (TDO) pins. Path selection 
and boundary scan cell operation is controlled by the TAP Con- 
troller. The controller initializes at start-up, but the Test Reset 
(TRST) input can asynchronously reset the test logic, if 
required. 


All system integrated circuit (IC) I/O signals are shifted in and 
out through the serial Test Data In and Test Data Out (TDI/ 
TDO) path. The TAP Controller is enabled by the Test Mode 
Select (TMS) input. The Test Clock (TCK), obtained from a sys- 
tem level bus or Automatic Test Equipment (ATE), supplies 
the timing signal for data transfer and system architecture 
operation. 


The dedicated TCK input enables the serial test data path 
between components to be used independently of component- 
specific system clocks. TCK also ensures that test data can be 
moved to or from a chip without changing the state of the on- 
chip system logic. 


The TCK signal is driven by an independent 50% duty cycle 
clock (generated by the Automatic Test Equipment). If the 
TCK must be stopped (for example, if the ATE must retrieve 
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Registers 


Instruction Register 


data from external memory and is unable to keep the clock 
running), it can be stopped at 0 or 1 indefinitely, without caus- 
ing any change to the test logic state. 


To ensure race-free operation, changes on the TAP’s TMS 
input are clocked into the test logic. Changes on the TAP’s TDI 
input are clocked into the selected register (Instruction or Test 
Data Register) on the rising edge of TCK. The contents of the 
selected register are shifted out onto the TAP output (TDO) on 
the falling edge of TCK. 


Boundary scan architectural elements include an Instruction 
Register (IR) and a group of Test Data Registers (TDRs). These 
registers have separate shift-register-based serial access paths, 
connected in parallel between the TDI and TDO pins. 


The TDRs are internal registers used by the Boundary Scan 
Architecture to process the test data. Each Test Data Register 
is addressed by an instruction scanned into the Instruction 
Register. The AMD-K5 processor includes the following TDRs: 


= Bypass Register (BR). See “Bypass Register” on page 92. 


Boundary Scan Register (BSR). See “Boundary Scan Regis- 
ter” on page 91. 


m Device Identification Register (DIR). See “Device Identifi- 
cation Register” on page 91. 


m Built-In Self-Test Result Register (BISTRR). See 
“RUNBIST” on page 95. 


The 5-bit Instruction Register (IR) is a serial-in parallel-out 
register that includes five shift register-based cells for holding 
instruction data. The instruction determines which test to run, 
which data register to access, or both. When the TAP controller 
enters the Capture IR state, the processor loads the IDCODE 
instruction in the IR. Executing Shift IR starts instructions 
shifting into the instruction register on the rising edge of TCK. 
Executing Update-IR loads the instruction from the serial shift 
register to the parallel register. 


The TAP controller is a synchronous, finite state machine that 
controls the test and debug logic sequence of operations. The 
TAP controller changes state in response to the rising edge of 
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TCK and defaults to the test logic reset state at power-up. 
Reinitialization to the test logic reset state is accomplished by 
holding the TMS pin High for five TCK periods. 


JTAG Register Organization 


Boundary Scan 
Register 


Device Identification 
Register 


All registers in the JTAG logic consist of the following two reg- 
ister ranks: 


ms A shift register 
a A parallel output register fed by the shift register 


Parallel input data is loaded into the shift register when the 
TAP controller exits the Capture state (Capture DR or Capture 
IR). The shift register then shifts data from TDI to TDO when 
in the Shift state (Shift DR or Shift IR). The output register 
holds the current data while new data is shifted into the shift 
register. The contents of the output register are updated when 
the TAP controller exits the Update state (Update DR or 
Update IR). The three registers described in this section are: 


= Boundary Scan Register 
m Device Identification Register 
= Bypass Register 


The Boundary Scan Register (BSR) is a 261-bit shift register 
with cells connected to all input and output pins and contain- 
ing cells for tri-state I/O control. This enables serial data to be 
loaded into or read from the processor boundary scan area. 


Output cells determine the value of the signal driven on the 
corresponding pin. Input cells only capture data. The EXTEST 
and SAMPLE/PRELOAD instructions can operate the BSR. 


The format of the Device Identification Register (DIR) is 
shown in Table 4-5. The fields include the following values: 


mu Version Number—This is incremented by AMD manufactur- 
ing for each major revision of silicon. 


= Bond Option—The two bits of the bond option depend on 
how the part is bonded at the factory. 


ms Part Number—This identifies the specific processor model. 
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ms Manufacturer—This is actually only 11 bits (11-1). The 
least-significant bit, bit 0, is always set to 1, as specified by 
the IEEE standard. 


Table 4-5. Test Access Port (TAP) ID Code 





Version Bond Option Unused Part Number | Manufacturer 
(Bits 31-28) (Bit 27 (Bits 26-24) | (Bits23-12) | (Bits 11-0) 
50Xh = Model 0 
Xh Xb 000b 51Xh = Model 1 00th 























The Bypass Register, a 1-bit shift register, provides the short- 
est path between TDI and TDO. When the component is not 
performing a test operation, this path is selected to allow trans- 
fer of test data to and from other components on the board. 
The Bypass Register is also selected during the HIGHZ, ALL1, 
ALLO, and BYPASS tests and for any unused instruction codes. 


The processor supports all three IEEE-mandatory instructions 
(BYPASS, SAMPLE/PRELOAD, EXTEST), three IEEE- 
optional instructions (IDCODE, HIGHZ, RUNBIST), and three 
instructions unique to the AMD-KS5 processor (ALL1, ALLO, 
USEHDT). Table 4-6 shows the complete set of public TAP 
instructions supported by the processor. The processor also 
implements several private manufacturing test instructions. 


The IEEE standard describes the mandatory and optional 
instructions. The ALL1 and ALLO instructions simply force all 
outputs and bidirectionals High or Low. The USEHDT instruc- 
tion is described on page 112. Any instruction encodings not 
shown in Table 4-6 select the BYPASS instruction. 
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Table 4-6. Public TAP Instructions 












































Instruction Encoding Register Description 
EXTEST 00000 BSR As defined by the IEEE standard 
SAMPLE/ PRELOAD 00001 BSR As defined by the IEEE standard 
IDCODE 00010 DIR As defined by the IEEE standard 
HIGHZ 00011 BR As defined by the IEEE standard 
ALL1 00100 BR Forces all outputs and bidirectionals High 
ALLO 00101 BR Forces all outputs and bidirectionals Low 
USEHDT 00110 HDTR Accesses the Hardware Debug Tool (HDT)! 
See page 112 
RUNBIST 00111 BISTRR As defined by the IEEE standard 
BYPASS 11111 BR As defined by the IEEE standard 
BYPASS imdcaned BR saat instruction encodings select the BYPASS 
Notes: 
1. Documentation on the Hardware Debug Tool (HDT) is available from AMD under a nondisclosure agreement. 








EXTEST 


SAMPLE/PRELOAD 


The EXTEST instruction permits circuits outside the compo- 
nent package to be tested. A common use of the EXTEST 
instruction is the testing of board interconnects. Boundary 


scan register cells at output pins are used to apply test stimuli, 


while those at input pins capture test results. Dependent on 
the value loaded into their control cell in the boundary scan 


register, the I/O pins are established as input or output. Inputs 
to the core logic retain the logic value set prior to execution of 
the EXTEST instruction. Upon exiting EXTEST, input pins are 


reconnected to the package pins. 


There are two functions performed by the SAMPLE/PRELOAD 


instruction, as follows: 


= Capturing an instantaneous picture of the normal operation 


of the device being tested. This function occurs if the 
instruction is executed while the TAP controller is in the 


Capture DR state and causes the Boundary Scan Register to 


sample the values present at the device pins. 


= Preloading data to the device pins to be driven to the board 


by the EXTEST instruction. This function occurs if the 
instruction is executed while the TAP controller is in the 
Update DR state and causes data to be preloaded to the 
device pins from the Boundary Scan Register. 
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IDCODE The execution of the IDCODE instruction connects the device 
identification register between TDI and TDO. Upon such con- 
nection, the device identification code can be shifted out of the 
register. 


HIGHZ This instruction forces all output and bidirectional pins into a 
tri-state condition. When this instruction is selected, the 
bypass register is selected for shifting between TDI and TDO. 
A signal called HIZEXT is responsible for forcing the tri-state 
to occur. This signal is generated in the TAP block, underneath 
JTAG_BIST, and goes to the PAD_TOP block. 


ALL! This instruction forces all output and bidirectional pins to a 
High logic level. 


The ALL1 instruction, like the HIGHZ instruction selects the 
bypass register for shifting between TDI and TDO. There is a 
signal called ALL1 that is responsible for forcing the pins to a 
High state. This signal is generated in the TAP block under- 
neath JTAG_BIST and goes to the PAD_TOP block. In the 
PAD_TOP block, this signal goes to boundary scan cells called 
BSLCD_OUT. The DOUT pins of the BSLCD_OUT cells are 
forced High when ALL1 is High. The SELPDR signal selects 
the boundary scan cells as the source for driving the outputs, if 
the SELPDR signal is High. The SELPDR signal is also gener- 
ated in the TAP block underneath JTAG_BIST and goes to the 
PAD_TOP block. 


ALLO This instruction forces all output and bidirectional pins to a 
Low logic level. 


The ALLO instruction, like the HIGHZ instruction, selects the 
bypass register for shifting between TDI and TDO. There is a 
signal called ALLO that is responsible for forcing the pins to a 
Low state. This signal is generated in the TAP block under- 
neath JTAG_BIST and goes to the PAD_TOP block. In the 
PAD_TOP block, this signal goes to boundary scan cells called 
BSLCD_OUT. The DOUT pins of the BSLCD_OUT cells are 
forced Low when ALLO is High. The SELPDR signal selects the 
boundary scan cells as the source for driving the outputs, if the 
SELPDR signal is High. The SELPDR signal is also generated 
in the TAP block underneath JTAG_BIST and goes to the 
PAD_TOP block. 
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This version of BIST is similar to the normal BIST mode, except 
that it is started by shifting in a TAP instruction. This instruc- 
tion should behave according to the rules of the IEEE 1149.1 
definition of RUNBIST. 


When the RUNBIST instruction is updated into the instruction 
register, a signal from the TAP_RTL block called JTGBIST is 
asserted High. This signal goes to the PAD_TOP and TESTC- 
TRL blocks. In PAD_TOP, this signal goes to the BRNBIST 
block and causes both INIT_SAMP and RUNBIST to be 
asserted. To the rest of the chip, it looks like a normal BIST 
operation is taking place. The JTGBIST signal also goes to the 
TESTCTRL block so that the BIST controller knows that the 
BIST operation was initiated from the TAP controller. This is 
necessary because the BIST results do not get transferred to 
the EAX register in this mode of operation. The JTAG_ BIST 
block also asserts the RESET_TAP pin to the CLOCKS block 
for 15 system clock cycles, in order to fake an external reset. 


The pattern that is shifted into the boundary scan ring, prior to 
the selection of the RUNBIST instruction, is driven at output 
and bidirectional cells during the duration of the instruction. 
The results of the execution of RUNBIST are saved in the BIST 
results register, which is 9 bits long and looks like the least sig- 
nificant 9 bits in the EAX register. This register is selected for 
shifting between TDI and TDO and can be shifted out after the 
completion of BIST. Bit 0 (CACHE data status) is shifted out 
first. The BIST results should be independent of signals 
received at non-clock input pins (except for RESET). 


The execution of the BYPASS instruction connects the bypass 
register between TDI and TDO, bypassing the test logic. 
Because of the pull-up resistor on the TDI input, the bypass 
register is selected if there is an open circuit in the board-level 
test data path following an instruction scan cycle. Any unused 
instruction bit patterns cause the bypass register to be 
selected for shifting between TDI and TDO. 
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The control bits listed in Table 4-8 have the characteristics 
described in Table 4-7. 






















































































Table 4-7. Control Bit Definitions 
Bit Definition 
144 Controls the direction of the Data bus (D63-D0). If the bit is set to 1, the 
bus acts as an input. If the bit is set to 0, the bus acts as an output. 
Controls the direction of the Address bus (A31-A3) and Address Parity 
213 (AP). If the bit is set to 1, the bus acts as an input. If the bit is set to 0, the 
bus acts as an output. 
Controls pins that can be tri-stated, but these pins never act as inputs. If 
257 the bit is set to 1, the pin is tri-stated. If the bit is set to 0, the pin acts as 
an output. 
Table 4-8. Boundary Scan Register Bit Definitions (Model 0) 
Bit Pin Name Comments 
0 DP7 Output Cell: Controlled by bit 144 
1 DP7 Input Cell 
2 D63 Output Cell: Controlled by bit 144 
3 D63 Input Cell 
4 D62 Output Cell: Controlled by bit 144 
5 D62 Input Cell 
6 D61 Output Cell: Controlled by bit 144 
7 D61 Input Cell 
8 D60 Output Cell: Controlled by bit 144 
9 D60 Input Cell 
10 D59 Output Cell: Controlled by bit 144 
11 D59 Input Cell 
12 D58 Output Cell: Controlled by bit 144 
13 D58 Input Cell 
14 D57 Output Cell: Controlled by bit 144 
15 D57 Input Cell 
16 D56 Output Cell: Controlled by bit 144 
17 D56 Input Cell 
18 DP6 Output Cell: Controlled by bit 144 
19 DP6 Input Cell 
20 D55 Output Cell: Controlled by bit 144 
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Table 4-8. Boundary Scan Register Bit Definitions (Model 0) (continued) 

















































































































Bit Pin Name Comments 

21 D55 Input Cell 

22 D54 Output Cell: Controlled by bit 144 
23 D54 Input Cell 

24 D53 Output Cell: Controlled by bit 144 
25 D53 Input Cell 

26 D52 Output Cell: Controlled by bit 144 
27 D52 Input Cell 

28 D51 Output Cell: Controlled by bit 144 
29 D51 Input Cell 

30 D50 Output Cell: Controlled by bit 144 
31 D50 Input Cell 

32 D49 Output Cell: Controlled by bit 144 
33 D49 Input Cell 

34 D48 Output Cell: Controlled by bit 144 
35 D48 Input Cell 

36 DP5 Output Cell: Controlled by bit 144 
37 DP5 Input Cell 

38 D47 Output Cell: Controlled by bit 144 
39 D47 Input Cell 

40 D46 Output Cell: Controlled by bit 144 
4] D46 Input Cell 

42 D45 Output Cell: Controlled by bit 144 
43 D45 Input Cell 

44 D44 Output Cell: Controlled by bit 144 
45 D44 Input Cell 

46 D43 Output Cell: Controlled by bit 144 
47 D43 Input Cell 

48 D42 Output Cell: Controlled by bit 144 
49 D42 Input Cell 

50 D41 Output Cell: Controlled by bit 144 
51 D41 Input Cell 

52 D40 Output Cell: Controlled by bit 144 
53 D40 Input Cell 

54 DP4 Output Cell: Controlled by bit 144 
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Table 4-8. Boundary Scan Register Bit Definitions (Model 0) (continued) 








































































































Bit Pin Name Comments 

55 DP4 Input Cell 

56 D39 Output Cell: Controlled by bit 144 
57 D39 Input Cell 

58 D38 Output Cell: Controlled by bit 144 
59 D38 Input Cell 

60 D37 Output Cell: Controlled by bit 144 
61 D37 Input Cell 

62 D36 Output Cell: Controlled by bit 144 
63 D36 Input Cell 

64 D35 Output Cell: Controlled by bit 144 
65 D35 Input Cell 

66 D34 Output Cell: Controlled by bit 144 
67 D34 Input Cell 

68 D33 Output Cell: Controlled by bit 144 
69 D33 Input Cell 

70 D32 Output Cell: Controlled by bit 144 
71 D32 Input Cell 

72 DP3 Output Cell: Controlled by bit 144 
73 DP3 Input Cell 

74 D31 Output Cell: Controlled by bit 144 
75 D31 Input Cell 

76 D30 Output Cell: Controlled by bit 144 
77 D30 Input Cell 

78 D29 Output Cell: Controlled by bit 144 
79 D29 Input Cell 

80 D28 Output Cell: Controlled by bit 144 
81 D28 Input Cell 

82 D27 Output Cell: Controlled by bit 144 
83 D27 Input Cell 

84 D26 Output Cell: Controlled by bit 144 
85 D26 Input Cell 

86 D25 Output Cell: Controlled by bit 144 
87 D25 Input Cell 

88 D24 Output Cell: Controlled by bit 144 
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Table 4-8. Boundary Scan Register Bit Definitions (Model 0) (continued) 

















































































































Bit Pin Name Comments 

89 D24 Input Cell 

90 DP2 Output Cell: Controlled by bit 144 
91 DP2 Input Cell 

92 D23 Output Cell: Controlled by bit 144 
93 D23 Input Cell 

94 D22 Output Cell: Controlled by bit 144 
95 D22 Input Cell 

96 D21 Output Cell: Controlled by bit 144 
97 D21 Input Cell 

98 D20 Output Cell: Controlled by bit 144 
99 D20 Input Cell 

100 D19 Output Cell: Controlled by bit 144 
101 D19 Input Cell 

102 D18 Output Cell: Controlled by bit 144 
103 D18 Input Cell 

104 D17 Output Cell: Controlled by bit 144 
105 D17 Input Cell 

106 D16 Output Cell: Controlled by bit 144 
107 D16 Input Cell 

108 DP1 Output Cell: Controlled by bit 144 
109 DP1 Input Cell 

110 D15 Output Cell: Controlled by bit 144 
111 D15 Input Cell 

112 D14 Output Cell: Controlled by bit 144 
113 D14 Input Cell 

114 D13 Output Cell: Controlled by bit 144 
115 D13 Input Cell 

116 D12 Output Cell: Controlled by bit 144 
117 D12 Input Cell 

118 D11 Output Cell: Controlled by bit 144 
119 Dil Input Cell 

120 D10 Output Cell: Controlled by bit 144 
121 D10 Input Cell 

122 D9 Output Cell: Controlled by bit 144 
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Table 4-8. Boundary Scan Register Bit Definitions (Model 0) (continued) 








































































































Bit Pin Name Comments 
123 D9 Input Cell 
124 D8 Output Cell: Controlled by bit 144 
125 D8 Input Cell 
126 DP Output Cell: Controlled by bit 144 
127 DP Input Cell 
128 D7 Output Cell: Controlled by bit 144 
129 D7 Input Cell 
130 D6 Output Cell: Controlled by bit 144 
131 D6 Input Cell 
132 D5 Output Cell: Controlled by bit 144 
133 D5 Input Cell 
134 D4 Output Cell: Controlled by bit 144 
135 D4 Input Cell 
136 D3 Output Cell: Controlled by bit 144 
137 D3 Input Cell 
138 D2 Output Cell: Controlled by bit 144 
139 D2 Input Cell 
140 D1 Output Cell: Controlled by bit 144 
141 D1 Input Cell 
142 DO Output Cell: Controlled by bit 144 
143 DO Input Cell 
144 Control Direction Control. See Table 4-7. 
145 STPLK Input Cell 
146 FRCMC Input Cell 
147 PEN Input Cell 
148 IGNNE Input Cell 
149 BF Input Cell 
150 INIT Input Cell 
151 SMI Input Cell 
152 R/S Input Cell 
153 NMI Input Cell 
154 INTR Input Cell 
155 A21 Output Cell: Controlled by bit 213 
156 A21 Input Cell 
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Table 4-8. Boundary Scan Register Bit Definitions (Model 0) (continued) 

















































































































Bit Pin Name Comments 
157 A22 Output Cell: Controlled by bit 213 
158 A22 Input Cell 
159 A23 Output Cell: Controlled by bit 213 
160 A23 Input Cell 
161 A24 Output Cell: Controlled by bit 213 
162 A24 Input Cell 
163 A25 Output Cell: Controlled by bit 213 
164 A25 Input Cell 
165 A26 Output Cell: Controlled by bit 213 
166 A26 Input Cell 
167 A27 Output Cell: Controlled by bit 213 
168 A27 Input Cell 
169 A28 Output Cell: Controlled by bit 213 
170 A28 Input Cell 
171 A29 Output Cell: Controlled by bit 213 
172 A29 Input Cell 
173 A30 Output Cell: Controlled by bit 213 
174 A30 Input Cell 
175 A31 Output Cell: Controlled by bit 213 
176 A31 Input Cell 
177 A3 Output Cell: Controlled by bit 213 
178 A3 Input Cell 
179 A4 Output Cell: Controlled by bit 213 
180 A4 Input Cell 
181 A5 Output Cell: Controlled by bit 213 
182 A5 Input Cell 
183 A6 Output Cell: Controlled by bit 213 
184 A6 Input Cell 
185 A7 Output Cell: Controlled by bit 213 
186 A7 Input Cell 
187 A8 Output Cell: Controlled by bit 213 
188 A8 Input Cell 
189 A9 Output Cell: Controlled by bit 213 
190 AQ Input Cell 
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Table 4-8. Boundary Scan Register Bit Definitions (Model 0) (continued) 








































































































Bit Pin Name Comments 
191 Al0 Output Cell: Controlled by bit 213 
192 Al0 Input Cell 
193 All Output Cell: Controlled by bit 213 
194 All Input Cell 
195 Al2 Output Cell: Controlled by bit 213 
196 Al2 Input Cell 
197 Al3 Output Cell: Controlled by bit 213 
198 Al3 Input Cell 
199 Al4 Output Cell: Controlled by bit 213 
200 Al4 Input Cell 
201 Al5 Output Cell: Controlled by bit 213 
202 Al5 Input Cell 
203 Al6 Output Cell: Controlled by bit 213 
204 Al6 Input Cell 
205 Al7 Output Cell: Controlled by bit 213 
206 Al7 Input Cell 
207 Al8 Output Cell: Controlled by bit 213 
208 Al8 Input Cell 
209 Al19 Output Cell: Controlled by bit 213 
210 A19 Input Cell 
211 A20 Output Cell: Controlled by bit 213 
212 A20 Input Cell 
213 Control Direction Control. See Table 4-7. 
214 SCYC Output Cell: Controlled by bit 257 
215 RESET Input Cell 
216 BE7 Output Cell: Controlled by bit 257 
217 BE6 Output Cell: Controlled by bit 257 
218 BE5 Output Cell: Controlled by bit 257 
219 BE4 Output Cell: Controlled by bit 257 
220 BE3 Output Cell: Controlled by bit 257 
221 BE2 Output Cell: Controlled by bit 257 
222 BET Output Cell: Controlled by bit 257 
223 BEO Output Cell: Controlled by bit 257 
224 W/R Output Cell: Controlled by bit 257 
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Table 4-8. Boundary Scan Register Bit Definitions (Model 0) (continued) 

















































































































Bit Pin Name Comments 

225 AIT Output Cell 

226 CLK Clock 

227 ADSC Output Cell: Controlled by bit 257 
228 ADS Output Cell: Controlled by bit 257 
229 CACHE Output Cell: Controlled by bit 257 
230 BRDYC Input Cell 

231 BRDY Input Cell 

232 EAD Input Cell 

233 PWT Output Cell: Controlled by bit 257 
234 LOCK Output Cell: Controlled by bit 257 
235 PCD Output Cell: Controlled by bit 257 
236 WB/WT Input Cell 

237 HITM Output Cell 

238 KEN Input Cell 

239 AHOLD Input Cell 

240 BOFF Input Cell 

241 HLDA Output Cell 

242 HOLD Input Cell 

243 NA Input Cell 

244 EWBE Input Cell 

245 M/IO Output Cell: Controlled by bit 257 
246 FLUSH Input Cell 

247 A20M Input Cell 

248 BUSCHK Input Cell 

249 AP Output Cell: Controlled by bit 213 
250 AP Input Cell 

251 D/C Output Cell: Controlled by bit 257 
252 BREQ Output Cell 

253 SMIACT Output Cell 

254 PCHK Output Cell 

255 APCHK Output Cell 

256 PRDY Output Cell 

257 Control Direction Control. See Table 4-7. 
258 INV Input Cell 
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Table 4-8. Boundary Scan Register Bit Definitions (Model 0) (continued) 








Bit Pin Name Comments 
259 FERR Output Cell 
260 TERR Output Cell 

















Table 4-9. Boundary Scan Register Bit Definitions (Model 1) 


































































































Bit Pin Name Comments 

0 DP7 Output Cell: Controlled by bit 144 
1 DP7 Input Cell 

2 D63 Output Cell: Controlled by bit 144 
3 D63 Input Cell 

4 D62 Output Cell: Controlled by bit 144 
5 D62 Input Cell 

6 D61 Output Cell: Controlled by bit 144 
7 D61 Input Cell 

8 D60 Output Cell: Controlled by bit 144 
9 D60 Input Cell 

10 D59 Output Cell: Controlled by bit 144 
11 D59 Input Cell 

12 D58 Output Cell: Controlled by bit 144 
13 D58 Input Cell 

14 D57 Output Cell: Controlled by bit 144 
15 D57 Input Cell 

16 D56 Output Cell: Controlled by bit 144 
17 D56 Input Cell 

18 DP6 Output Cell: Controlled by bit 144 
19 DP6 Input Cell 

20 D55 Output Cell: Controlled by bit 144 
21 D55 Input Cell 

22 D54 Output Cell: Controlled by bit 144 
23 D54 Input Cell 

24 D53 Output Cell: Controlled by bit 144 
25 D53 Input Cell 

26 D52 Output Cell: Controlled by bit 144 
27 D52 Input Cell 
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Table 4-9. Boundary Scan Register Bit Definitions (Model 1) (continued) 

















































































































Bit Pin Name Comments 

28 D51 Output Cell: Controlled by bit 144 
29 D51 Input Cell 

30 D50 Output Cell: Controlled by bit 144 
31 D50 Input Cell 

32 D49 Output Cell: Controlled by bit 144 
33 D49 Input Cell 

34 D48 Output Cell: Controlled by bit 144 
35 D48 Input Cell 

36 DP5 Output Cell: Controlled by bit 144 
37 DP5 Input Cell 

38 D47 Output Cell: Controlled by bit 144 
39 D47 Input Cell 

40 D46 Output Cell: Controlled by bit 144 
Al D46 Input Cell 

42 D45 Output Cell: Controlled by bit 144 
43 D45 Input Cell 

44 D44 Output Cell: Controlled by bit 144 
45 D44 Input Cell 

46 D43 Output Cell: Controlled by bit 144 
47 D43 Input Cell 

48 D42 Output Cell: Controlled by bit 144 
49 D42 Input Cell 

50 D41 Output Cell: Controlled by bit 144 
51 D41 Input Cell 

52 D40 Output Cell: Controlled by bit 144 
53 D40 Input Cell 

54 DP4 Output Cell: Controlled by bit 144 
55 DP4 Input Cell 

56 D39 Output Cell: Controlled by bit 144 
57 D39 Input Cell 

58 D38 Output Cell: Controlled by bit 144 
59 D38 Input Cell 

60 D37 Output Cell: Controlled by bit 144 
61 D37 Input Cell 
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Table 4-9. Boundary Scan Register Bit Definitions (Model 1) (continued) 








































































































Bit Pin Name Comments 

62 D36 Output Cell: Controlled by bit 144 
63 D36 Input Cell 

64 D35 Output Cell: Controlled by bit 144 
65 D35 Input Cell 

66 D34 Output Cell: Controlled by bit 144 
67 D34 Input Cell 

68 D33 Output Cell: Controlled by bit 144 
69 D33 Input Cell 

70 D32 Output Cell: Controlled by bit 144 
71 D32 Input Cell 

72 DP3 Output Cell: Controlled by bit 144 
73 DP3 Input Cell 

74 D31 Output Cell: Controlled by bit 144 
75 D31 Input Cell 

76 D30 Output Cell: Controlled by bit 144 
77 D30 Input Cell 

78 D29 Output Cell: Controlled by bit 144 
79 D29 Input Cell 

80 D28 Output Cell: Controlled by bit 144 
81 D28 Input Cell 

82 D27 Output Cell: Controlled by bit 144 
83 D27 Input Cell 

84 D26 Output Cell: Controlled by bit 144 
85 D26 Input Cell 

86 D25 Output Cell: Controlled by bit 144 
87 D25 Input Cell 

88 D24 Output Cell: Controlled by bit 144 
89 D24 Input Cell 

90 DP2 Output Cell: Controlled by bit 144 
91 DP2 Input Cell 

92 D23 Output Cell: Controlled by bit 144 
93 D23 Input Cell 

94 D22 Output Cell: Controlled by bit 144 
95 D22 Input Cell 
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Table 4-9. Boundary Scan Register Bit Definitions (Model 1) (continued) 

















































































































Bit Pin Name Comments 

96 D21 Output Cell: Controlled by bit 144 
97 D21 Input Cell 

98 D20 Output Cell: Controlled by bit 144 
99 D20 Input Cell 

100 D19 Output Cell: Controlled by bit 144 
101 D19 Input Cell 

102 D18 Output Cell: Controlled by bit 144 
103 D18 Input Cell 

104 D17 Output Cell: Controlled by bit 144 
105 D17 Input Cell 

106 D16 Output Cell: Controlled by bit 144 
107 D16 Input Cell 

108 DP1 Output Cell: Controlled by bit 144 
109 DP1 Input Cell 

110 D15 Output Cell: Controlled by bit 144 
111 D15 Input Cell 

112 D14 Output Cell: Controlled by bit 144 
113 D14 Input Cell 

114 D13 Output Cell: Controlled by bit 144 
115 D13 Input Cell 

116 D12 Output Cell: Controlled by bit 144 
117 D12 Input Cell 

118 D11 Output Cell: Controlled by bit 144 
119 Dil Input Cell 

120 D10 Output Cell: Controlled by bit 144 
121 D10 Input Cell 

122 D9 Output Cell: Controlled by bit 144 
123 D9 Input Cell 

124 D8 Output Cell: Controlled by bit 144 
125 D8 Input Cell 

126 DP Output Cell: Controlled by bit 144 
127 DP Input Cell 

128 D7 Output Cell: Controlled by bit 144 
129 D7 Input Cell 
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Table 4-9. Boundary Scan Register Bit Definitions (Model 1) (continued) 








































































































Bit Pin Name Comments 

130 D6 Output Cell: Controlled by bit 144 
131 D6 Input Cell 

132 D5 Output Cell: Controlled by bit 144 
133 D5 Input Cell 

134 D4 Output Cell: Controlled by bit 144 
135 D4 Input Cell 

136 D3 Output Cell: Controlled by bit 144 
137 D3 Input Cell 

138 D2 Output Cell: Controlled by bit 144 
139 D2 Input Cell 

140 D1 Output Cell: Controlled by bit 144 
141 D1 Input Cell 

142 DO Output Cell: Controlled by bit 144 
143 DO Input Cell 

144 Control Direction Control. See Table 4-7. 
145 STPLK Input Cell 

146 BF1 Input Cell 

147 FRCMC Input Cell 

148 PEN Input Cell 

149 IGNNE Input Cell 

150 BFO Input Cell 

151 INIT Input Cell 

152 SMI Input Cell 

153 R/S Input Cell 

154 NMI Input Cell 

155 INTR Input Cell 

156 A21 Output Cell: Controlled by bit 213 
157 A21 Input Cell 

158 A22 Output Cell: Controlled by bit 213 
159 A22 Input Cell 

160 A23 Output Cell: Controlled by bit 213 
161 A23 Input Cell 

162 A24 Output Cell: Controlled by bit 213 
163 A24 Input Cell 
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Table 4-9. Boundary Scan Register Bit Definitions (Model 1) (continued) 

















































































































Bit Pin Name Comments 
164 A25 Output Cell: Controlled by bit 213 
165 A25 Input Cell 
166 A26 Output Cell: Controlled by bit 213 
167 A26 Input Cell 
168 A27 Output Cell: Controlled by bit 213 
169 A27 Input Cell 
170 A28 Output Cell: Controlled by bit 213 
171 A28 Input Cell 
172 A29 Output Cell: Controlled by bit 213 
173 A29 Input Cell 
174 A30 Output Cell: Controlled by bit 213 
175 A30 Input Cell 
176 A31 Output Cell: Controlled by bit 213 
177 A31 Input Cell 
178 A3 Output Cell: Controlled by bit 213 
179 A3 Input Cell 
180 A4 Output Cell: Controlled by bit 213 
181 A4 Input Cell 
182 A5 Output Cell: Controlled by bit 213 
183 A5 Input Cell 
184 A6 Output Cell: Controlled by bit 213 
185 A6 Input Cell 
186 A7 Output Cell: Controlled by bit 213 
187 A7 Input Cell 
188 A8 Output Cell: Controlled by bit 213 
189 A8 Input Cell 
190 A9 Output Cell: Controlled by bit 213 
191 AQ Input Cell 
192 Al0 Output Cell: Controlled by bit 213 
193 Al0 Input Cell 
194 All Output Cell: Controlled by bit 213 
195 All Input Cell 
196 Al2 Output Cell: Controlled by bit 213 
197 Al2 Input Cell 
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Table 4-9. Boundary Scan Register Bit Definitions (Model 1) (continued) 








































































































Bit Pin Name Comments 

198 Al3 Output Cell: Controlled by bit 213 
199 Al3 Input Cell 

200 Al4 Output Cell: Controlled by bit 213 
201 Al4 Input Cell 

202 Al5 Output Cell: Controlled by bit 213 
203 Al5 Input Cell 

204 Al6 Output Cell: Controlled by bit 213 
205 Al6 Input Cell 

206 Al7 Output Cell: Controlled by bit 213 
207 Al7 Input Cell 

208 Al8 Output Cell: Controlled by bit 213 
209 Al8 Input Cell 

210 Al9 Output Cell: Controlled by bit 213 
211 A19 Input Cell 

212 A20 Output Cell: Controlled by bit 213 
213 A20 Input Cell 

214 Control Direction Control. See Table 4-7. 
215 SCYC Output Cell: Controlled by bit 257 
216 RESET Input Cell 

217 BE7 Output Cell: Controlled by bit 257 
218 BE6 Output Cell: Controlled by bit 257 
219 BE5 Output Cell: Controlled by bit 257 
220 BE4 Output Cell: Controlled by bit 257 
221 BES Output Cell: Controlled by bit 257 
222 BE2 Output Cell: Controlled by bit 257 
223 1 Output Cell: Controlled by bit 257 
224 BEO Output Cell: Controlled by bit 257 
225 W/R Output Cell: Controlled by bit 257 
226 HIT Output Cell 

227 CLK Clock 

228 ADSC Output Cell: Controlled by bit 257 
229 ADS Output Cell: Controlled by bit 257 
230 ACHE Output Cell: Controlled by bit 257 
231 BRDYC Input Cell 
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Table 4-9. Boundary Scan Register Bit Definitions (Model 1) (continued) 





































































































Bit Pin Name Comments 

232 BRDY Input Cell 

233 EAD Input Cell 

234 PWT Output Cell: Controlled by bit 257 
235 LOCK Output Cell: Controlled by bit 257 
236 PCD Output Cell: Controlled by bit 257 
237 WB/WT Input Cell 

238 HITM Output Cell 

239 KEN Input Cell 

240 AHOLD Input Cell 

241 BOFF Input Cell 

242 HLDA Output Cell 

243 HOLD Input Cell 

244 NA Input Cell 

245 EWBE Input Cell 

246 M/IO Output Cell: Controlled by bit 257 
247 FLUSH Input Cell 

248 A20M Input Cell 

249 BUSCHK Input Cell 

250 AP Output Cell: Controlled by bit 213 
251 AP Input Cell 

252 D/C Output Cell: Controlled by bit 257 
253 BREQ Output Cell 

254 SMIACT Output Cell 

255 PCHK Output Cell 

256 APCHK Output Cell 

257 PRDY Output Cell 

258 Control Direction Control. See Table 4-7. 
259 INV Input Cell 

260 FERR Output Cell 

261 IERR Output Cell 











Boundary Scan Architecture Support 


117 


AMDdZ 
AMD-K5 Processor Software Development Guide 20007D/0—Sep1996 





Hardware Debug Tool (HDT) 





The Hardware Debug Tool (HDT)—sometimes referred to as 
the debug port or Probe Mode—is a collection of signals, regis- 
ters, and processor microcode that is enabled when external 
debug logic drives R/S Low or loads the processor’s Test Access 
Port (TAP) instruction register with the USEHDT instruction. 


Documentation on the HDT is available under nondisclosure 
agreement to test and debug developers. For information, con- 
tact your local AMD sales representative or field application 
engineer. 
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Appendix A 


Cache 





The individual locations of all SRAM arrays on the AMD-K5 
microprocessor are accessible with the RDMSR and WRMSR 
instructions. To access an array location, set up the Array 
Access MSR code (82h) in ECX, and the array pointer 
(described below) in EDX. EAX holds the data to be read or 
written. 


A.1 = Array Pointer Formats 





Note: The term “column” in this description refers to the “way”— 
one of the four blocks in the 4-way associative set at a par- 
ticular index. 


The array pointer in EDX specifies a particular array, column, 
index, and possibly word or dword, depending on the array to 
be accessed. 
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Table A-1. Cache Array Pointer Formats 
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Bits 29-28 27-20 | 19 18-13 12 11 10 9 8 7-0 
DCACHE taf column (NA JNA |8aY fafa [va [NA [Na | 2tfay to be 
array index accessed 
DCACHE 
dword and Column | NA NA data array DCACHE dword index NA Ina {array to be 
data array index into the block accessed 
index in block 

; ICACHE word (two 
palate bas ICACHE index for all ICACHE instruction bytes + array to be 
andword— |Column |NA : NA 
arrays associated prede- accessed 
Model 0 : ‘ 
code information 
ICACHE index ICACHE nee wore 
ICACHE index for all instruction bytes + | array to be 
andword— |Column |NA Packet |NA : 
ICACHE arrays associated prede- _| accessed 
Model 1 Select ; : 
code information 
4KbyteTlB column {NA [NA |NA TLB index atay 1908 
index accessed 
4MbyteTLB J column {NA [NA |NA nA [NA [NA [NA {Na | Atay to be 
index accessed 
Notes: 
For the instruction cache and data cache, the index/dword/word fields line up with a normal address, except that they are shifted to 
the left by 8 bits. 





Table A-2 defines the array identification value to be used 


when accessing the various arrays. 





Table A-2. Cache Array Identification Values 



































Bits 7-0 (MSB to LSB) Array to be Accessed 
00h Data Cache Array 
Eth Data Cache Linear Tag/Status Array 
ECh Data Cache Physical Tag Array 
E4h Instruction Cache Store Array 
E5h Instruction Cache Linear Tag Array 
EDh Instruction Cache Physical Tag Array 
E6h Instruction Cache Valid Bit Array 
E7h Instruction Cache Branch Prediction Array 
E8h Translation Lookaside Buffer 4-Kbyte Page Frame/Status Array 
Notes: 
Although EDX is normally cleared on RDMSR, it remains intact during array accesses. 
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Table A-2. Cache Array Identification Values (continued) 














Bits 7-0 (MSB to LSB) Array to be Accessed 
E9h Translation Lookaside Buffer 4-Kbyte Linear Tag Array 
Eah Translation Lookaside Buffer 4-MByte Page Frame/Status Array 
Ebh Translation Lookaside Buffer 4-MByte Virtual Tag Array 
Notes: 
Although EDX is normally cleared on RDMSR, it remains intact during array accesses. 











A.2. AMD-K5 Model 0 Array Data Formats 





Table A-3. AMD-K5 Model 0 ICACHE Physical Tags 





Bits 31-21 Bit 20 Bits 19-0 
0 Valid Bit Tag (Physical Address 31-12) 

















Table A-4. AMD-K5 Model 0 DCACHE Physical Tags 





Bits 31-23 Bits 22-21 Bits 20-0 
0 MESI (00=invalid, 01=shared, 10=modified, 11=exclusive) Tag (Physical Address 31-11) 

















Table A-5. AMD-K5 Model 0 DCACHE Data 





Bits 31-0 
Data 











Table A-6. AMD-K5 Model 0 DCACHE Linear Tag 





Bit 27 Bit 26 Bit 25 Bit 24 Bit 23 Bit 22 Bit 21 Bits 20-0 
PCD PWT Dirty Bit | User/Supervisor Bit} R/W Bit 0 Linear Valid Bit Tag 
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Table A-7. AMD-K5 Model 0 ICACHE Instructions 









































Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit 
25 24 23 22-21 20-13 12 11 10 9-8 7-0 
prefix 1 byte 1 prefix 0 byte 0 
start .. | opcode map start end opcode map 
bit enee bit (rops/mrom) eye bit bit bit (rops/mrom) eyes 





Table A-8. AMD-K5 Model 0 ICACHE Linear Tag 





Bits 19-0 
Linear Address 31-12 











Table A-9. AMD-K5 Model 0 ICACHE Valid Bits 





p DO | linear tag valid bit byte-valid bits 


Table A-10. AMD-K5 Model 0 ICACHE Branch Prediction 


eissi-1e | pitts | sis-ve | sitsis-12| nits 1-4 | its 3-0 
predicted | byte offset within block of last byte column of dare target 
taken of predicted branch instruction predicted target P target byte 


Table A-11. AMD-K5 Model 0 TLB 4-Kbyte Linear Tag 





global valid ar user/supervisor read/write aes tag 
hae ee dirty bit bit bit wang (linear address 31-17) 
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Table A-12. AMD-K5 Model 0 TLB 4-Kbyte Physical Page Frame 





; : Page frame address 
Table A-13. AMD-K5 Model 0 TLB 4-Mbyte Virtual Tag 


ae rer ; =e ee tag 
a Global valid bit | dirty bit | user/supervisor | read/write bit | valid bit (linear address 31-22) 


Table A-14. AMD-K5 Model 0 TLB 4-Mbyte Physical Page Frame 








. : Page frame address 


A.3 AMD-K5 Model 1 Array Data Formats 





Table A-15. AMD-K5 Model 1 ICACHE Physical Tags 





Bits 31-21 Bit 20 Bits 19-0 
0 Valid Bit Tag (Physical Address 31-12) 

















Table A-16. AMD-K5 Model 1 DCACHE Physical Tags 





Bits 31-23 Bits 22-21 Bits 20-0 
0 MESI (00=invalid, 01=shared, 10=modified, 11=exclusive) Tag (Physical Address 31-11) 

















Table A-17. AMD-K5 Model 1 DCACHE Data 





Bits 31-0 
Data 
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Table A-18. AMD-K5 Model 1 DCACHE Linear Tag 
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Bit 28 | Bit 27 | Bit 26 Bit 25 Bit 24 Bit 23 Bit 22 Bit 21 Bits 20-0 
WB PCD | PWT | Dirty Bit | User/Supervisor Bit | R/W Bit 0 Linear Valid Bit Tag 
Table A-19. AMD-K5 Model 1 ICACHE Instructions 
Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit 
25 24 23 22-21 20-13 12 11 10 9-8 7-0 
prefix 1 rile prefix 0 byte (n) 
start .. | opcode map byte start end opcode map 
bit eee bit (rops/mrom) | (n+8) bit bit bit (rops/mrom) byte (n) 
Table A-20. AMD-K5 Model 1 ICACHE Linear Tag 
Bit 22 Bit 21 Bit 20 Bits 19-0 
D Linear Valid Bit User/Supervisor Bit Linear Address 31-12 




















Table A-21. AMD-K5 Model 1 ICACHE Valid Bits 





Table A-22. AMD-K5 Model 1 ICACHE Branch Prediction 


Bits 31-19 


predicted | byte offset within block of last byte column of 
taken of predicted branch instruction predicted target 


Bits 31-0 


byte-valid bits 





index of isc 
predicted b 
target Y 
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ae A-23. Peat Model 1 TLB 4-Kbyte Linear Tag 





global valid user/supervisor read/write tag 
eT dirty bit bit bit valid Dit fincas sei 17) 


Table A-24. AMD-K5 Model 1 TLB 4-Kbyte Physical Page Frame 
Pewse [o|weetmo 
; ‘ Page frame address 
Table A-25. AMD-K5 Model 1 TLB 4-Mbyte Virtual Tag 
itssi-15| site | eitrs | nite [sit | sitio | nito-o 
ae is : eo ne tag 


Table A-26. AMD-K5 Model 1 TLB 4-Mbyte Physical Page Frame 











; , Page frame address 





AMD-K5 Model 1 Array Data Formats A-7 


AMDd@ 
AMD-K5 Processor Software Development Guide 20007D/0—Sep1996 








